Answer Generation with Temporal Data Integration
V´eronique Moriceau
Universit´e Paul Sabatier - IRIT
31062 Toulouse cedex 09, France
moriceau@irit.fr
Abstract
In this paper, we propose an approach for con-
tent determination and surface generation of an-
swers in a question-answering system on the web.
The content determination is based on a coherence
rate which takes into account coherence with other
potential answers. Answer generation is made
through the use of classical techniques and tem-
plates and is based on a certainty degree.
1 Introduction
Search engines on the web and most of existing question-
answering systems provide the user with either a set of hyper-
links or web page extracts containing answer(s) to a question.
As provenance information (defined in [McGuinness et al.,
2004] e.g., source, date, author, etc.) is rather difficult to ob-
tain, we assume that all web pages are equally reliable. Then,
the problem the system has to solve is to generate an answer
to a question even if several possible answers are selected
by the extraction engine. For this purpose, we propose to
integrate, according to certain criteria, the different possible
answers in order to generate a single coherent answer which
take into account the diversity of answers (which can be re-
dundant, incomplete, inconsistent, etc.).
As our framework is WEBCOOP [Benamara, 2004], a co-
operative question-answering system on the web, our goal is
to generate answers in natural language which explain how
confident of the answer the user can be.
In this paper, we focus on aspects of content determination
and on the generation of answers in natural language. In the
following sections, we first present the main difficulties and a
general typology of integration mechanisms. Then we anal-
yse the content determination process in the case of answers
of type date. Finally, we present briefly a few elements about
generation of integrated answers and evaluation.
2 Motivations
When a user submits a question to a classical search engine
or question-answering system, he may obtain a set of poten-
tial answers which may be incoherent to some degree: we
mean by incoherent, answers that are a priori contradictory
but which can be in fact equivalent, complementary, etc. In
this case, the user may be unsastisfied because he does not
know which answer among those proposed is the correct one.
In the following sections, we present related works and a
general typology of relations between candidate answers.
2.1 Related works
Most of existing systems on the web produce a set of an-
swers to a question in the form of hyperlinks or page extracts,
ranked according to a relevance score (for example, COGEX
[Moldovan et al., 2003]). Other systems also define relation-
ships between web page extracts or texts containing possi-
ble answers ([Harabagiu et al., 2004], [Radev et al., 1998]).
For example, [Webber et al., 2002] defines 4 relationships be-
tween possible answers:
a0 equivalence: equivalent answers which entail mutually,
a0 inclusion: one-way entailment of answers,
a0 aggregation: answers that are mutually consistent but
not entailing, and that can be replaced by their conjunc-
tion,
a0 alternative: answers that are inconsistent or alternatives
and that can be replaced by their disjunction.
Most of question-answering systems generate answers
which take into account neither information given by all can-
didate answers nor their inconsistency. This is the point we
focus on in the following section.
2.2 A general typology of integration mechanisms
To better characterise our problem, we collected, via Google
or QRISTAL [QRISTAL], a corpus of around 100 question-
answer pairs in French that reflect different inconsistency
problems. We first assume that all candidate answers are po-
tentially correct. The corpus analysis enables us to define a
general typology of relations between answers. For each rela-
tion defined in [Webber et al., 2002], we identify integration
mechanisms in order to generate answers which take into ac-
count characteristics of all candidate answers.
Inclusion
The inclusion relation exists if a candidate answer entails an-
other answer (for example, between concepts of candidate an-
swers linked in an ontology by the is-a or part-of relations).
For example, in Brittany and in France are correct an-
swers to the question Where is Brest? and Brittany is a part
of France. The content determination stage consists here in
choosing which answer will be proposed to the user - the
more specific, the more generic or all answers. This can be
guided by a user model, taking into account his knowledge.
Equivalence
Candidate answers which are linked by an equivalence rela-
tion are consistent and entail mutually. The corpus analysis
allows us to identify two main types of equivalence:
(1) Lexical equivalence: synonymy, metonymy, para-
phrases, proportional series, use of acronyms or foreign
languages. For example, to the question Who killed John
Lennon?, Mark Chapman, the murderer of John Lennon and
John Lennon’s killer Mark Chapman are equivalent answers.
(2) Equivalence with inference: in a number of cases, some
common knowledge, inferences or calculation are necessary
to detect equivalence relations. For example, The A320 is 21
and The A320 has been created in 1984 are equivalent an-
swers to the question How old is the Airbus A320?.
Aggregation
The aggregation relation defines a set of consistent answers
when the question accepts several different ones. In this case,
all candidate answers are potentially correct and can be inte-
grated in the form of a conjunction of all these answers. For
example, an answer to the question Where is Disneyland? can
be in Tokyo, Paris, Hong-Kong and Los Angeles.
If answers are numerical values, the integrated answer can
be given in the form of an interval, average or comparison.
Alternative
The alternative relation defines a set of inconsistent answers.
In the case of questions expecting a unique answer, only one
answer among candidates is correct. On the contrary, all can-
didates can be correct answers.
(1) A simple solution is to propose a disjunction of
candidate answers. For example, if the question When does
autumn begin? has the candidate answers Autumn begin on
September 21st and Autumn begins on September 20th, an
answer such as Autumn begins on either September 20th or
September 21st can be proposed.
(2) If candidate answers have common characteristics, it is
possible to integrate them according to these characteristics.
For example, the question When does the French music
festival take place? has the following answers June 1st 1982,
June 21st 1983, ..., June 21st 2004. Here, the extraction en-
gine selects pages containing the dates of all music festivals.
These candidate answers have day and month in common.
Consequently, an answer such as The French music festival
takes place every June 21st can be proposed.
(3) As for the aggregation relation, numerical values
can be integrated in the form of an interval, average or
comparison. For example, if the question How far is Paris
from Toulouse? has the candidate answers 713 km, 678
km and 681 km, answers such as Paris is at about 690 km
from Toulouse (average) or The distance between Paris
and Toulouse is between 678 and 713 km (interval) can be
proposed.
In the following sections, we focus on the content deter-
mination and generation of candidate answers of type date
linked by an aggregation or alternative relation, the most
common ones.
3 Content determination
The problem we focus on in this section is the problem of
content determination when several answers to a question of
type date are selected. We consider that candidate answers
can be in the form of date or temporal interval. A date is
defined as a vector which allows the temporal localisation of
an event. Some values of vectors can be underspecified: only
relevant values for the expected information are explicit (year,
hour, etc.). Then, an interval is a couple of dates, i.e. vectors
defining a date of beginning and a date of end.
As answers selected by the extraction engine are often in
different forms (dates or intervals or both), a first step consists
in standardizing data:
a0 all candidate answers are in the form of an interval: this
means that a date will be in the form of an interval hav-
ing the same date of beginning and of end,
a0 some candidate answers may be incomplete: for ex-
ample, year or date of end is missing, etc. In some
cases, unification with other candidate answers is pos-
sible. Otherwise, incomplete answers are omitted,
a0 from the semantic point of view, all candidate answers
must be in the same system of temporal reference (for
example, because of possible different time zones).
Once all candidate answers have been standardized, aber-
rant answers are filtered out by applying classical statistical
methods. Then, the answer selection process can be applied.
3.1 Answer selection process
Our goal is to select, among several candidate answers, the
best answer considered as the one which is the most coherent
with other answers. For this purpose, we define a coherence
rate of answers.
Let us assume that there are N candidate answers coming
from N different web pages. We consider that each candi-
date answer is a temporal interval a0a1a3a2a5a4a6a1a8a7a10a9 where a1a8a2 is the
date of beginning and a1a3a7 the date of end of the event. Let
a1a12a11a14a13a15a0a1 a2a17a16 a4a6a1 a7a18a16 a9 with
a19a21a20a23a22a24a20a26a25 be these N candidate an-
swers.
In terms of interval, we consider that the most coherent
answer is the interval which intersects the greatest number
of candidate intervals. For example, in Figure 1, we have 3
candidate answers a1a3a27a28a4a6a1a12a29 and a1a12a30 . They form 4 sub-intervals:
a0a1a12a2a18a31a32a4a10a1a12a2a17a33a34a9 , a0a1a8a2a35a33a36a4a6a1a8a2a17a37a38a9 , a0a1a12a2a17a37a39a4a10a1a12a7a40a37a38a9 and a0a1a8a7a18a37a36a4a6a1a8a7a41a31a10a9 .
The interval we consider as the most coherent is a0a1 a2 a37a36a4a6a1 a7 a37a38a9
because its occurrence frequency is 3 (i.e. the number of
times it intersects the candidate answers is 3).
In order to define sub-intervals, we need to have the
bounds of the N candidate intervals. Let a42 a13a44a43a32a1 a2a35a45 a4a10a1 a7a46a45a39a47 ,
1a20a49a48a50a20 N, be the set of ordered bounds of the N intervals and
let a51 a11a53a52 a42 , 1a20a54a22a55a20 2N. Consequently, a sub-interval is in the
form of a0a51 a11a41a4 a51 a11a57a56 a27 a9 .
We now define a58a53a59 as the occurrence frequency of the
Figure 1: Sub-intervals
interval a1 a59 , i.e. the number of times a1 a59 intersects the N
candidate answers:
a0
a19 a20 a22 a20a2a1a39a25a4a3 a19
a4
a58a6a5a7
a16a9a8
a7
a16a11a10 a31a13a12 a13a15a14a17a16a19a18a28a1a21a20 a43 a0
a51
a11a40a4
a51
a11a57a56 a27 a9a17a4
a22a24a23
a14a24a25a26a16
a22
a0
a19 a20 a48a50a20 a25
a4 a0
a51
a11a40a4
a51
a11a57a56 a27 a9a28a27 a1a30a29 a47a32a31
Then, the coherence rate a33 a11 assigned to each sub-interval
a0
a51
a11a6a4
a51
a11 a56 a27 a9 is a weighting of the occurrence frequency by the
number of candidate answers:
a0
a19 a20 a22a55a20a34a1a39a25a4a3 a19
a4
a33
a11 a13 a58a6a5a7
a16a35a8
a7
a16a11a10 a31a13a12
a25
Selecting the interval having the highest coherence rate
is not sufficient. The answer must also have a relevant
duration. For this purpose, we construct new intervals based
on previous sub-intervals: these new ones must have a
relevant duration, at least equal to the average duration of
the N candidate answers. Let a1 a23a37a36a39a38 be the average duration
of candidate answers.
Then, we construct a coherent answer set composed of in-
tervals satisfying a constraint duration to which we assigned
a new coherence rate. This new rate is the average of the
coherence rates of sub-intervals composing the new one. So,
the coherent answer set a40 is defined as:
a40
a13 a43a41a20a6a0
a51
a11a6a4
a51 a59
a56 a27 a9a17a4
a33
a11 a8
a59
a56 a27a42a31 a4 a0
a19 a20 a22 a20a43a1a39a25a4a3 a19
a4
a22a45a44a34a46 a20a43a1a39a25a4a3 a19
a4 a0
a51
a11a41a4
a51 a59
a56 a27 a9 a13
a59
a47
a29a49a48 a11
a0
a51
a29a36a4
a51
a29a6a56 a27 a9
a50
a22a13a33
a25
a33
a11 a8
a59
a56 a27 a13
a59
a51
a29a49a48 a11
a33
a29
a46a53a52 a19a54a3a49a22
a4
a22a24a23
a14a42a25a26a16
a22
a55a57a56
a33
a20 a1
a23 a36a39a38
a31
a20
a1
a23
a5a7
a16 a8
a7a59a58
a10 a31 a12
a20
a55a60a56
a33
a20a35a1
a23 a36a39a38
a31
a52 a19
a47
Once this coherent answer set has been obtained, there is
still to check if the expected answer/event is a unique or an
iterative event. We consider that an event is iterative if there is
a great number of intervals of a40 that are distant in time. Let
a61 be the minimum time between the end of an interval and
the beginning of the following one. Let a62 be the minimum
number of intervals that have to be a61 distant from the others
(the parameters a61 and a62 depends on data granularity). Then,
an event is iterative if:
a0
a19 a20 a22 a20a2a1a39a25a4a3 a19
a4
a22a45a44a34a46 a20a43a1a39a25a4a3 a19
a4
a14a17a16a19a18a28a1a63a20 a43 a0
a51
a11a40a4
a51 a59
a9 a52
a40 such as
a0
a22 +a19 a20a49a48a50a20a43a1a39a25a4a3 a19
a4
a0
a51
a11a57a56 a27a39a4
a51
a29 a9 a52
a40
a4a65a64
a51
a11 a56 a27
a3 a51 a59
a64a67a66
a61
a47a59a31 a66
a62
At this stage, there are two possibilities:
a0 either the event is unique: the answer set
a40
a56
a22 is com-
posed of intervals of a40 having the highest coherence
rate:
a40
a56
a22
a13 a43a41a20a41a0
a51
a11 a4
a51 a59
a9a46a4
a33
a11
a59
a31 a4
a19 a20 a22 a20a43a1a39a25 -a19
a4
a22a24a23
a14a42a25a68a16
a22
a0 a20a41a0
a51
a29 a4
a51a70a69
a9a46a4
a33
a29
a69
a31 a52
a40
a4 1
a20a49a48 a20 2N-1, a33
a11
a59
a13
a51
a16a57a71a72a20
a33
a29
a69
a31a53a47
a0 or the event is iterative: there may be some temporal
constraints due to the question: for example, the ques-
tion expects an event in the past or in the future, an event
in a particular year, etc. Let a40a74a73 be the set of intervals of
a40 satisfying the question constraints. Then, a40
a56
a22 is the
set of answers/intervals (having the highest coherence
rate) which can be proposed to the user:
a40
a56
a22
a13 a43a41a20a41a0
a51
a11 a4
a51 a59
a9a46a4
a33
a11
a59
a31 a4
a19 a20 a22 a20a43a1a39a25 -a19
a4
a22a24a23
a14a42a25a68a16
a22
a0 a20a41a0
a51
a29a36a4
a51 a69
a9a17a4
a33
a29
a69
a31 a52
a40a75a73
a4 1
a20 a48 a20 2N-1, a33
a11
a59
a13
a51
a16a19a71a76a20
a33
a29
a69
a31 a47
In this section, we proposed a method for content deter-
mination based on coherence rate in the case of answers of
type date and in particular of type interval. In the following
section, we apply this method to an example.
3.2 Example
Let us suppose that the question When did Hugo hurricane
take place? is submitted to a question-answering system. The
following table presents the candidate answers:
Question When did Hugo hurricane take place?
September 16th, 1989
Candidate September 1989, from 10 to 22
Answers September 16th, 1989
September 17th, 1989
from 10th to 25th September, 1989
September 16th, 1989
September 16th, 1989
from 16th to 22nd September, 1989
September 1989, from 10 to 25
September 16th, 1989
September 16th, 1989
The following table presents the 11 candidate answers in
the form of interval and their respective duration (number of
days):
Question When did Hugo hurricane take place?
a1 a27 = [16-9-1989,16-9-1989], a1
a23
a27 = 1
Candidate a1 a29 = [10-9-1989,22-9-1989], a1 a23 a29 = 12
Answers a1 a30 = [16-9-1989,16-9-1989], a1 a23 a30 = 1
a1a60a77 = [17-9-1989,17-9-1989], a1
a23
a77 = 1
a1a57a78 = [10-9-1989,25-9-1989], a1
a23
a78 = 15
a1a57a79 = [16-9-1989,16-9-1989], a1
a23
a79 = 1
a1a57a80 = [16-9-1989,16-9-1989], a1
a23
a80 = 1
a1a57a81 = [16-9-1989,22-9-1989], a1
a23
a81 = 6
a1a57a82 = [10-9-1989,25-9-1989], a1
a23
a82 = 15
a1 a27a84a83 = [16-9-1989,16-9-1989], a1
a23
a27a84a83 = 1
a1a3a27a6a27 = [16-9-1989,16-9-1989], a1
a23
a27a6a27 = 1
The ordered set of interval bounds is for example:
a42
a13 a43 a1 a2 a33a36a4 a1 a2 a31a32a4 a1 a7 a31a39a4 a1 a2a1a0 a4 a1 a7a2a0 a4 a1 a7 a37a36a4 a1 a7
a3
a47
Consequently, we have (cf. Figure 2):
a51
a27 a13 a1 a2 a33 a13 10-9-1989,
a51
a29 a13 a1 a2 a31 a13 16-9-1989,
a51
a30 a13 a1a8a7a41a31 a13 16-9-1989,
a51
a77 a13 a1a8a2 a0 a13 17-9-1989,
a51
a78 a13 a1 a7a4a0 a13 17-9-1989,
a51
a79 a13 a1 a7 a37 a13 22-9-1989,
a51
a80 a13 a1 a7
a3
a13 25-9-1989,
Figure 2: 11 candidate answers
The coherence rates of each sub-interval are:
a33
a27 a13 a58 a5a7
a31 a8
a7
a33
a5
a25
a13
a14a17a16a19a18a28a1 a20a41a0
a51
a27 a4
a51
a29 a0a6a5 a1a30a29a8a7 a31a4a9 a45 a9a11a10a13a12 a31
a25
a13a15a14
a19a36a19
a13a17a16a19a18
a1a21a20
a33
a29 a13 a58 a5a7
a33 a8
a7
a37 a12
a25
a13
a14a17a16a19a18a28a1 a20a41a0
a51
a29 a4
a51
a30a38a9a22a5 a1 a29a8a7a31a2a9 a45 a9a11a10a13a12 a31
a25
a13 a19
a16
a19 a19
a13a23a16a24a18a25
a19
a33
a30 a13 a58
a12
a7
a37 a8
a7
a0
a5
a25
a13
a14a17a16a19a18a28a1 a20 a9
a51
a30 a4
a51
a77 a0a6a5 a1a30a29a8a7 a31a4a9 a45 a9a11a10a13a12 a31
a25
a13a27a26
a19a36a19
a13a17a16a19a18
a14a11a28
a33
a77 a13 a58 a5a7
a0 a8
a7 a3
a12
a25
a13
a14a17a16a19a18a28a1 a20a41a0
a51
a77a12a4
a51
a78 a9a22a5 a1 a29a8a7a31a2a9 a45 a9a11a10a13a12 a31
a25
a13a27a29
a19 a19
a13a23a16a24a18
a26 a29
a33
a78 a13 a58
a12
a7 a3
a8
a7a31a30a39a5
a25
a13
a14a17a16a19a18a28a1 a20 a9
a51
a78 a4
a51
a79a36a0a6a5 a1a30a29 a7 a31a4a9 a45 a9a11a10a13a12 a31
a25
a13 a26
a19a36a19
a13a17a16a19a18
a14a11a28
a33
a79 a13 a58 a5a7 a30
a8
a7a33a32
a12
a25
a13
a14a17a16a19a18a28a1 a20a41a0
a51
a79 a4
a51
a80a38a9a22a5 a1a30a29a8a7a31a2a9 a45 a9a11a10a13a12 a31
a25
a13 a1
a19 a19
a13a23a16a24a18
a19a35a34
The average duration of candidate answers is 5 days. Now,
we construct the answer set a40 with sub-intervals having a
duration between 5 and 6 days and we assign to them a new
coherence rate:
a1
a23
a5a7
a31 a8
a7
a33
a5
a13
a29
a16 a56 a1
a33
a27a41a29 a13
a33
a27 a13a23a16a24a18
a1a11a20
a1
a23
a5a7
a31 a8
a7
a37 a12 a13
a28
a16 a56 a1
a33
a27a41a30 a13 a33
a27
a52 a33
a29
a1
a13a36a16a19a18
a29
a25
a1
a23
a5a7
a33 a8
a7a31a30a39a5
a13
a28
a16 a56 a1
a33
a29 a79 a13 a33
a29
a52a65a33
a30
a52a65a33
a77
a52 a33
a78
a26
a13a23a16a24a18
a29 a1
a1
a23
a12
a7
a37 a8
a7a31a30a39a5
a13
a28
a16 a56 a1
a33
a30 a79 a13 a33
a30
a52a65a33
a77
a52a65a33
a78
a14
a13a17a16a19a18
a14
a25
a1
a23
a5a7
a0a42a8
a7 a30 a5
a13
a29
a16 a56 a1
a33
a77 a79 a13 a33
a77
a52a65a33
a78
a1
a13a37a16a19a18
a26 a19
a1
a23
a12
a7 a3
a8
a7 a30 a5
a13
a29
a16 a56 a1
a33
a78 a79 a13
a33
a78 a13a23a16a24a18
a14a38a28
Consequently, the intervals satisfying the average duration
are: a40 a13 a43 a0a51 a27 a4 a51 a29 a0 a4a55a0a51 a27 a4 a51 a30 a9a46a4a55a0a51 a29 a4 a51 a79 a0 a4 a9 a51 a30 a4 a51 a79a12a0 a4
a0
a51
a77 a4
a51
a79 a0 a4 a9
a51
a78 a4
a51
a79 a0 a47
The event is non-iterative since every interval of a40 is
contiguous to the following one. So, the answer is the
interval of a40 having the highest coherence rate: a40 a56 a22 a13
a20a41a0
a51
a27a32a4
a51
a30 a9a17a4a39a16a24a18
a29
a25 a31 i.e. from September, 10th to 16nd 1989.
4 Answer generation
Once the most coherent answer has been elaborated, it has to
be generated in natural language. Our strategy is to couple
classical NLG techniques with generation templates.
As our framework is the cooperative system WEBCOOP, the
answer proposed to the user has to explain why this answer
has been selected. The idea is to introduce possibility degrees
to explain to the user how confident of the answer he can be.
For this purpose, we define a certainty degree of answers
which depends on several parameters:
a0 the number of candidate answers (
a25 ): if a25 and the co-
herence rate of the selected answer are high, then this
means that there were not many contradictions among
candidate answers and that the answer is more certain
(as a25 is already taken into account in the coherence rate,
only this rate is a sufficient parameter),
a0 if the difference
a40 between the best coherence rate and
the second best one is high, then this means that the se-
lected answer is more certain.
Consequently, we define the certainty degree a41 a11 a59 of the an-
swer a0a51 a11a41a4 a51 a59 a9 as:
a41
a11
a59
a13a43a42
a19 a44a6a45 a33
a11
a59
a13
a19
a40a47a46a70a33
a11
a59 a48a33a44a6a49a51a50a47a52
a20a41a0
a51
a11 a4
a51 a59
a9a46a4
a33
a11
a59
a31 a52
a40
a56
a22 and
a40
a13
a33
a11
a59 a3a34a33
a29
a69 where a33
a11
a59 is the
best coherence rate and a33 a29 a69 the second best one.
As a16 a20 a33 a11 a59 a20 a19 and a16 a20a53a40 a20 a19 , the more a41 a11 a59 tends towards
1, the more the answer a0a51 a11 a4 a51 a59 a9 is certain. Thus, we define
generation schemas for each type of answer depending on this
certainty degree. We distinguish 3 main cases:
(1) either a40 a56 a22 a13a55a54 , i.e. no answer has been selected. The
idea is to select the candidate answer which has the highest
coherence rate even if its duration is not appropriate but the
generated answer has to explain that this answer is not sure,
(2) or a41 a11 a59 a13 a19 , i.e. the selected answer a0a51 a11a41a4 a51 a59 a9 is certain,
(3) or a41 a11 a59a1a0a13 a19 , then the generated answer has to take into
account a40 . If a40 is low, the coherence rate of the selected an-
swer is very close to other rates: in this case, several answers
are potentially correct and can be proposed to the user.
The idea is to generate answers with different certainty de-
grees depending on a41 : we choose to express this degree by
the use of adverbs. For this purpose, we define a lexicalisa-
tion function lex which lexicalises the selected answers and a
function lexD which lexicalises a41 . The Table 1 presents the
different generation schemas (a40 is the selected answer and
a40a3a2 the answer having the coherence rate the closest to a41a5a4 ).
Underlined fragments are predefined texts.
case (1) subject lexD(a41a6a4 , min) verb lex(A, Reg)
case (2) subject verb lex(A, Reg)
case (3) a40 is high:
subject lexD(a41a6a4 , ) verb lex(A, Reg)
a40 is low: a40 and a40 a2 are proposed
if a40 is a date:
subject lexD(a41a6a4 , ) verb lex(A, Reg)
or lex(Aa2 , Reg)
if a40 is an interval:
subject lexD(a41a6a4a8a7 , ) verb lex(Aa2 , Reg)
but lexD(a41 a4 a7 , plus) lex(A, Reg)
Table 1: Generation schemas
Adverb intensity is represented by the following propor-
tional serie (cf. Figure 3):
Figure 3: Adverb intensity
Consequently, if a41 is high, it will be lexicalised by an ad-
verb of high intensity. The second argument of the function
a9a11a10 a71a13a12 (
a51a24a22
a56
a23 a22 or
a14
a9
a23 a22 ) forces the function to lexicalise
a41 as
an adverb of lower or higher intensity than the one that would
have been used normally (case (1) and (3)).
The a9a11a10 a71 function has 2 arguments: the answers that have
to be generated and a15 a10a6a16 indicating if the event is regular or
not. Indeed, if an iterative event is regular, i.e. happens at
regular intervals (i.e. the parameter a61 is always the same for
all answers of a40 ), then generalisation can be made on com-
mon characteristics. For example, if a61 = 1 year, a possible
generalisation is: X takes place every year on ....
Example 1
To the question When was Chomsky born?, the only potential
answer and its respective coherence rate is ([07-12-1928, 07-
12-1928], 1). Its certainty degree is: a41 a13 a19 .
We are in case (2) so the generated answer is in the form:
subject verb lex(A, Reg).
The answer is not a regular event. Consequently, the answer
in natural language is:
Chomsky was born on December, 7th 1928.
Example 2
To the question In which year did D. Tutu receive the Nobel
Peace Prize?, the potential answers and their respective co-
herence rate are: (1931, 0.08), (1984, 0.87) and (1986, 0.04).
The answer (1984, 0.87) is selected because it has the highest
coherence rate and its certainty degree is:
a41
a13 a20a1a16a24a18
a34a21a20 a3
a16a19a18a16
a34
a31
a46
a16a24a18
a34a21a20
a13a17a16a19a18
a28
a25
We are in case (3) with a high a40 (a16a24a18a34a21a20a37a3 a16a19a18a16 a34 ) so the generated
answer is in the form:
subject lexD(a17a19a18 , ) verb lex(A, Reg).
The answer is not a regular event and its certainty degree is
high so the adverb intensity has to be high. Consequently, the
answer in natural language is:
D. Tutu probably received the Nobel Peace Prize in 1984.
Example 3
To the question When did the American Civil War take
place?, the potential answers and their respective coherence
rate are:
- ([01-01-1861, 09-04-1865], 0.29),
- ([12-04-1861, 09-04-1865], 0.32),
- ([17-04-1861, 09-04-1865], 0.33).
The answer ([17-04-1861, 09-04-1865], 0.33) is selected
because it has the highest coherence rate and its certainty
degree is: a41 a13 a20 a16a19a18a14a11a14 a3 a16a19a18a14 a1 a31 a46 a16a24a18a14a38a14 a13a37a16a19a18a16a11a16 a14
We are in case (3) with a low a40 (a16a19a18a14a11a14 a3 a16a24a18a14 a1 ) and the answer
is an interval so the generated answer is in the form:
subject lexD(a17 a18 a7 , ) verb lex(Aa20, Reg) but
lexD(a17 a18 a7 , plus) lex(A, Reg),
with a40 a2 = [01-01-1861, 09-04-1865] (since all other answers
have a quasi-similar coherence rate, a40 a2 is the interval includ-
ing all the others). The answer is not a regular event and its
certainty degree is very low so the adverb intensity has to be
very low. Consequently, the answer in natural language is:
The American Civil War possibly took place from 1861 to
April, 9th 1865 but most possibly from April, 17th 1861 to
April, 9th 1865.
In this paper, we did not detail the lexicalisation of dates
but classical lexicalisation and aggregation techniques are
applied for example to group common characteristics (from
September, 10th to 22th instead of from September, 10th to
September, 22th, etc).
5 Evaluation
We evaluate our approach by applying our answer selection
method to 72 questions expecting an answer of type date.
Among these questions, 36 questions expected an answer of
type date and 36 expected an temporal interval.
These 72 questions were submitted to QRISTAL. Applying
our answer selection process (called Cont.Det. in the follow-
ing tables), we distinguish several cases: either the proposed
answer is correct, or it is incorrect or the proposed answer is
included in the interval defining the exact date of the event or
the answer is incomplete. We note ”impossible” cases when it
is impossible to select an answer (when all candidate answers
have the same occurrence frequency).
Figure 4: Evaluation on 72 questions
We compare the results of our content determination
method not only to QRISTAL’s results but also to the results
obtained by a ”most frequent answer” method. Our approach
obtains better results on questions expecting an answer of
type temporal interval and particularly on questions about it-
erative events (for example, When does the next X take place?
When did the first Y happen?, ...). This is partly due to the
fact that a ”most frequent answer” method, for example, is
not able to solve temporal references.
Among the ”incorrect” answers, most errors can be ex-
plained by the fact that some incorrect candidate answers
introduce a bias in the calculation of the average duration.
A way to solve this problem is to eliminate some candidate
answers by analysing in more depth their contexts of occur-
rence. Linguistic information and semantic knowledge about
answer concepts may allow to determine if a candidate an-
swer selected by QRISTAL is appropriate or not, incomplete,
etc.
6 Conclusion
In this paper, we presented an approach for content deter-
mination, based on a coherence rate, and surface genera-
tion, based on a certainty degree of answers in a question-
answering system on the web. Several future directions are
obviously considered:
a0 analyse in more depth of the contexts of occurrence of
candidate answers in order to filter out incorrect answers
or to precise some of them. This analysis will avoid hav-
ing answers which introduce a bias in calculations,
a0 evaluation of the quality of answers in natural language:
are adverbs sufficient to explain the certainty degree of
the answer?.

References
[Benamara, 2004] F. Benamara. WEBCOOP: un syst`eme
question-r´eponse coop´eratif sur le Web. PhD Thesis, Uni-
versit´e Paul Sabatier, Toulouse, 2004.
[Chalendar et al., 2002] G. de Chalendar, T. Delmas, F.
Elkateb, O. Ferret, B. Grau, M. Hurault-Plantet, G. Il-
louz, L. Monceaux, I. Robba, A. Vilnat. The Question-
Answering system QALC at LIMSI, Experiments in using
Web and WordNet. In Proceedings of TREC 11, 2002.
[Harabagiu et al., 2004] S. Harabagiu, F. Lacatusu. Strate-
gies for Advanced Question Answering. In Proceedings
of the Workshop on Pragmatics of Question Answering
at HLT-NAACL 2004.
[McGuinness et al., 2004] D.L. McGuinness, P. Pinheiro da
Silva. Trusting Answers on the Web. New Directions in
Question-Answering, chapter 22, Mark T. Maybury (ed),
AAAI/MIT Press, 2004.
[Moldovan et al., 2003] D. Moldovan, C. Clark, S.
Harabagiu, S. Maiorano. COGEX: A Logic Prover
for Question Answering. In Proceedings of HLT-NAACL
2003.
[Radev et al., 1998] D.R. Radev, K.R. McKeown. Generat-
ing Natural Language Summaries from Multiple On-Line
Sources. Computational Linguistics, vol. 24, issue 3 -
Natural Language Generation, pp. 469 - 500, 1998.
[Webber et al. 2002] B. Webber, C. Gardent, J. Bos. Position
statement: Inference in Question Answering. In Proceed-
ings of LREC, 2002.
[QRISTAL] Question-R´eponse Int´egrant un Syst`eme de
Traitement Automatique des Langues. www.qristal.fr,
Synapse D´eveloppement, 2004.
