Proceedings of the ACL Workshop on Empirical Modeling of Semantic Equivalence and Entailment, pages 37–42,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Discovering entailment relations using “textual entailment patterns”
Fabio Massimo Zanzotto
DISCo, University of Milano-Bicocca,
Via Bicocca degli Arcimboldi 8, Milano, Italy,
zanzotto@disco.unimib.it
Maria Teresa Pazienza, Marco Pennacchiotti
DISP, University of Rome “Tor Vergata”,
Viale del Politecnico 1, Roma, Italy,
{pennacchiotti, pazienza}@info.uniroma2.it
Abstract
In this work we investigate methods to en-
able the detection of a specific type of tex-
tual entailment (strict entailment), start-
ing from the preliminary assumption that
these relations are often clearly expressed
in texts. Our method is a statistical ap-
proach based on what we call textual en-
tailment patterns, prototypical sentences
hiding entailment relations among two ac-
tivities. We experimented the proposed
method using the entailment relations of
WordNet as test case and the web as cor-
pus where to estimate the probabilities;
obtained results will be shown.
1 Introduction
Textual entailment has been recently defined as a
common solution for modelling language variability
in different NLP tasks (Glickman and Dagan, 2004).
Roughly, the problem is to recognise if a given tex-
tual expression, the text (t), entails another expres-
sion, the hypothesis (h). An example is determining
whether or not “Yahoo acquired Overture (t) entails
Yahoo owns Overture (h)”. More formally, the prob-
lem of determining a textual entailment between t
and h is to find a possibly graded truth value for the
entailment relation t → h.
Since the task involves natural language expres-
sions, textual entailment has a more difficult nature
with respect to logic entailment, as it hides two dif-
ferent problems: paraphrase detection and what can
be called strict entailment detection. Generally, this
task is faced under the simplifying assumption that
the analysed text fragments represent facts (ft for
the ones in the text and fh for those in the hypothe-
sis) in an assertive or negative way. Paraphrase de-
tection is then needed when the hypothesis h carries
a fact f that is also in the target text t but is described
with different words, e.g., Yahoo acquired Overture
vs. Yahoo bought Overture. On the other hand, strict
entailment emerges when target sentences carry dif-
ferent facts, fh negationslash= ft. The challenge here is to derive
the truth value of the entailment ft → fh. For exam-
ple, a strict entailment is “Yahoo acquired Overture
→ Yahoo owns Overture”. In fact, it does not de-
pend on the possible paraphrasing between the two
expressions but on an entailment of the two facts
governed by acquire and own.
Whatever the form of textual entailment is, the
real research challenge consists in finding a rel-
evant number of textual entailment prototype re-
lations such as “X acquired Y entails X owns Y” or
“X acquired Y entails X bought Y” that can be used
to recognise entailment relations. Methods for ac-
quiring such textual entailment prototype relations
are based on the assumption that specific facts are
often repeated in possibly different linguistic forms.
These forms may be retrieved using their anchors,
generally nouns or noun phrases completely char-
acterising specific facts. The retrieved text frag-
ments are thus considered alternative expressions
for the same fact. This supposed equivalence is
then exploited to derive textual entailment proto-
type relations. For example, the specific fact Yahoo
bought Overture is characterised by the two anchors
37
{Yahoo, Overture}, that are used to retrieve in the
corpus text fragments where they co-occur, e.g. “Ya-
hoo purchased Overture (July 2003).”, “Now that
Overture is completely owned by Yahoo!...”. These
retrieved text fragments are then considered good
candidate for paraphrasing X bought Y.
Anchor-based learning methods have been used
to investigate many semantic relations ranging from
very general ones as the isa relation in (Morin, 1999)
to very specific ones as in (Ravichandran and Hovy,
2002) where paraphrases of question-answer pairs
are searched in the web or as in (Szpektor et al.,
2004) where a method to scan the web for searching
textual entailment prototype relations is presented.
These methods are mainly devoted to induce entail-
ment pairs related to the first kind of textual entail-
ment, that is, paraphrasing as their target is mainly
to look for the same “fact” in different textual forms.
Incidentally, these methods can come across strict
entailment relations whenever specific anchors are
used for both a fact ft and a strictly entailed fact fh.
In this work we will investigate specific meth-
ods to induce the second kind of textual entailment
relations, that is, strict entailment. We will focus
on entailment between verbs, due to the fact that
verbs generally govern the meaning of sentences.
The problem we are facing is to look for (or ver-
ify) entailment relations like vt → vh (where vt is
the text verb and vh the hypothesis verb). Our ap-
proach is based on an intuition: strict entailment re-
lations among verbs are often clearly expressed in
texts. For instance the text fragment “Player wins
$50K in Montana Cash” hides an entailment rela-
tion between two activities, namely play and win. If
someone wins, he has first of all to play, thus, win →
play. The idea exploits the existence of what can be
called textual entailment pattern, a prototypical sen-
tence hiding an entailment relation among two activ-
ities. In the abovementioned example the pattern in-
stance player win subsumes the entailment relation
“win → play”.
In the following we will firstly describe in Sec.
2 our method to recognise entailment relations be-
tween verbs that uses: (1) the prior linguistic knowl-
edge of these textual entailment patterns and (2) sta-
tistical models to assess stability of the implied re-
lations in a corpus. Then, we will experiment our
method by using the WordNet entailment relations
as test cases and the web as corpus where to esti-
mate the probabilities (Sec. 3). Finally we will draw
some conclusions (Sec. 4).
2 The method
Discovering entailment relations within texts im-
plies the understanding of two aspects: firstly, how
these entailment relations are usually expressed and,
secondly, when an entailment relation may be con-
sidered stable and commonly shared. Assessing the
first aspect requires the investigation of which are
the prototypical textual forms that describe entail-
ment relations. We will call them textual entailment
patterns. These patterns (analysed in Sec. 2.2) will
enable the detection of point-wise entailment asser-
tions, that is, candidate verb pairs that still need a
further step of analysis in order to be considered
true entailment expressions. In fact, some of these
candidates may be not enough stable and commonly
shared in the language to be considered true en-
tailments. To better deal with this second aspect,
methods for statistically analysing large corpora are
needed (see later in Sec. 2.3).
The method we propose may be used in either: (1)
recognising if entailment holds between two verbs,
or, (2) extracting from a corpus C all the implied
entailment relations. In recognition, given a verb
pair, the related textual entailment expressions are
derived as instances of the textual entailment pat-
terns and, then, the statistical entailment indicators
on a corpus C are computed to evaluate the stability
of the relation. In extraction, the corpus C should
be scanned to extract textual expressions that are in-
stances of the textual entailment patterns. The re-
sulting pairs are sorted according to the statistical
entailment indicators and only the best ranked are
retained as useful verb entailment pairs.
2.1 An intuition
Our method stems from an observation: verb logical
subjects, as any verb role filler, have to satisfy spe-
cific preconditions as the theory of selectional re-
strictions suggests. Then, if in a given sentence a
verb v has a specific logical subject x, its selectional
restrictions imply that the subject has to satisfy some
preconditions p, that is, v(x) → p(x). This can be
read also as: if x has the property of doing the action
38
v this implies that x has the property p. For example,
if the verb is to eat, the selectional restrictions of eat
would imply, among other things, that its subject is
an animal. If the precondition p is “having the prop-
erty of doing an action a”, the constraint may imply
that the action v entails the action a, that is, v → a.
As for selectional restriction acquisition, the pre-
vious observation can enable the use of corpora as
enormous sources of candidate entailment relations
among verbs. For example “John McEnroe won the
match...” can contribute to the definition of the selec-
tional restriction win(x) → human(x) (since John
McEnroe is a human), as well as to the induction (or
verification) of the entailment relation between win
and play, since John McEnroe has the property of
playing. However, as the example shows, classes
relevant for acquiring selectional preferences may
be more explicit than active properties useful to de-
rive entailment relations (i.e., it is easier to derive
that John McEnroe is a human than that he has the
property of playing).
This limitation can be overcome when agentive
nouns such as runner play subject roles in some sen-
tences. Agentive nouns usually denote the “doer” or
“performer” of some action a. This is exactly what
is needed to make clearer the relevant property of
the noun playing the logical subject role, in order to
discover entailment. The action a will be the one en-
tailed by the verb heading the sentence. For exam-
ple, in “the player wins”, the action play evocated
by the agentive noun player is entailed by win.
2.2 Textual entailment patterns
As observed for the isa relations in (Hearst, 1992)
local and simple inter-sentential patterns may carry
relevant semantic relations. As we saw in the pre-
vious section, this also happens for entailment re-
lations. Our aim is thus to search for an initial set
of textual patterns that describe possible linguistic
forms expressing entailment relations between two
verbs (vt,vh). By using these patterns, actual point-
wise assertions of entailment can be detected or ver-
ified in texts. We call these prototypical patterns tex-
tual entailment patterns.
The idea described in Sec. 2.1 can be straight-
forwardly applied to generate textual entailment pat-
terns, as it often happens that verbs can undergo an
agentive nominalization (hereafter called personifi-
cation), e.g., play vs. player. Whether or not an
entailment relation between two verbs (vt,vh) holds
according to some writer can be verified looking for
sentences with expressions involving the agentive
nominalization of the hypothesis verb vh. Then, the
procedure to verify if entailment between two verbs
(vt,vh) holds in a point-wise assertion is: whenever
it is possible to personify the hypothesis vh, scan the
corpus to detect the expressions where the personi-
fied hypothesis verb is the subject of a clause gov-
erned by the text verb vt.
Given the two investigated verbs (vt,vh) we will
refer to this first set of textual entailment patterns
as personified patterns Ppers(vt,vh). This set will
contain the following textual patterns:
Ppers(vt,vh) =
{“pers(vh)|number:sing vt|person:third,tense:present”,
“pers(vh)|number:plur vt|person:nothird,tense:present”,
“pers(vh)|number:sing vt|tense:past”,
“pers(vh)|number:plur vt|tense:past”}
where pers(v) is the noun deriving from the person-
ification of the verb v and elements such as l|f1,...,fN
are the tokens generated from lemmas l by apply-
ing constraints expressed via the features f1,...,fN.
For example, in the case of the verbs play and win,
the related set of textual entailment expressions de-
rived from the patterns will be Ppers(win,play)
= { “player wins”, “players win”, “player won”,
“players won” }. In the experiments hereafter de-
scribed, the required verbal inflections (except per-
sonification) have been obtained using the publicly
available morphological tools described in (Minnen
et al., 2001) whilst simple heuristics have been used
to personify verbs1.
As the statistical measures introduced in the fol-
lowing section are those usually used for study-
ing co-occurrences, two more sets of expressions,
Fpers(v) and F(v), are needed to represent the sin-
gle events in the pair. These are defined as:
Fpers(v) = {“pers(v)|number:sing”,“pers(v)|number:plur”}
F(v) = {“v|person:third,tense:present”,
“v|person:nothird,tense:present”,“v|tense:past”}
1Personification, i.e. agentive nominalization, has been ob-
tained adding “-er” to the verb root taking into account possible
special cases such as verbs ending in “-y”. A form is retained
as a correct personification if it is in WordNet.
39
2.3 Measures to estimate the entailment
strength
The above textual entailment patterns define point-
wise entailment assertions. In fact, if pattern in-
stances are found in texts, the only conclusion that
may be drawn is that someone (the author of the
text) sustains the related entailment pairs. A sen-
tence like “Painter draws on old techniques but cre-
ates only decorative objects.” suggests that painting
entails drawing. However, it may happen that these
correctly detected entailments are accidental, that is,
the detected relation is only valid for that given text.
For example, the text fragment “When a painter dis-
covers this hidden treasure, other people are imme-
diately struck by its beauty.” if taken in insulation
suggests that painting entails discovering, but this is
questionable. Furthermore, it may also happen that
patterns detect wrong cases due to ambiguous ex-
pressions like “Painter draws inspiration from for-
est, field” where the sense of the verb draw is not
the one expected.
In order to get rid of these wrong verb pairs, an
assessment of point-wise entailment assertions over
a corpus is needed to understand how much the de-
rived entailment relations are shared and commonly
agreed. This validation activity can be obtained by
both analysing large textual collections and applying
statistical measures relevant for the task.
Before introducing the statistical entailment indi-
cators, some definitions are necessary. Given a cor-
pus C containing samples, we will refer to the abso-
lute frequency of a textual expression t in the corpus
C with fC(t). The definition is easily extended to a
set of expressions T as follows:
fC(T) =summationdisplay
t∈T
fC(t)
Given a pair vt and vh we may thus define the fol-
lowing entailment strength indicators S(vt,vh), re-
lated to more general statistical measures.
The first relevance indicator, Sf(vt,vh), is related
to the probability of the textual entailment pattern
as it is. This probability may be represented by the
frequency, as the fixed corpus C makes constant the
total number of pairs:
Sf(vt,vh) = log10(fC(Ppers(vt,vh)))
where logarithm is used to contrast the effect of the
Zipf’s law. This measure is often positively used in
terminology extraction (e.g., (Daille, 1994)).
Secondly, another measure Smi(vt,vh) related to
point-wise mutual information (Fano, 1961) may
be also used. Given the possibility of estimating
the probabilities through maximum-likelihood prin-
ciple, the definition is straightforward:
Smi(vt,vh) = log10 p(Ppers(vt,vh))p(F
pers(vt))p(F(vh))
where p(x) = fC(x)/fC(.). The aim of this mea-
sure is to indicate the relatedness between two el-
ements composing a pair. Mutual information has
been positively used in many NLP tasks such as col-
location analysis (Church and Hanks, 1989), termi-
nology extraction (Damerau, 1993), and word sense
disambiguation (Brown et al., 1991).
3 Experimental Evaluation
As many other corpus linguistic approaches, our en-
tailment detection model relies partially on some lin-
guistic prior knowledge (the expected structure of
the searched collocations, i.e., the textual entailment
patterns) and partially on some probability distribu-
tion estimation. Only a positive combination of both
these two ingredients can give good results when ap-
plying (and evaluating) the model.
The aim of the experimental evaluation is then to
understand, on the one side, if the proposed textual
entailment patterns are useful to detect entailment
between verbs and, on the other, if a statistical mea-
sure is preferable with respect to the other. We will
here evaluate the capability of our method to recog-
nise entailment between given pairs of verbs.
We carried out the experiments using the web as
the corpus C where to estimate our two textual en-
tailment measures (Sf and Smi) and GoogleTM as
a count estimator. The findings described in (Keller
and Lapata, 2003) seem to suggest that count estima-
tions we need in the present study over Subject-Verb
bigrams are highly correlated to corpus counts.
As test bed we used existing resources: a non triv-
ial set of controlled verb entailment pairs is in fact
contained in WordNet (Miller, 1995). There, the en-
tailment relation is a semantic relation defined at the
synset level, standing in the verb subhierarchy. Each
40
Figure 1: ROC curves
pair of synsets (St,Sh) is an oriented entailment re-
lation between St and Sh. WordNet contains 415
entailed synsets. These entailment relations are con-
sequently stated also at the lexical level. The pair
(St,Sh) naturally implies that vt entails vh for each
possible vt ∈ St and vh ∈ Sh. It is then possible
to derive from the 415 entailment synset a test set of
2,250 verb pairs. As the proposed model is appli-
cable only when hypotheses can be personified, the
number of the pairs relevant for the experiment is
thus reduced to 856. This set is hereafter called the
True Set (TS).
As the True Set is our starting point for the eval-
uation, it is not possible to produce a natural distri-
bution in the verb pair space between entailed and
not-entailed elements. Then, precision, recall, and
f-measure are not applicable. The only solution is
to use a ROC (Green and Swets, 1996) curve mix-
ing sensitity and specificity. What we then need is a
Control Set (CS) of verb pairs that in principle are
not in entailment relation. The Control Set has been
randomly built on the basis of the True Set: given
the set of all the hypothesis verbs H and the set of
all the text verbs T of the True Set, control pairs are
obtained randomly extracting one element from H
and one element from T. A pair is considered a con-
trol pair if it is not in the True Set. For comparative
purposes the Control Set has the same cardinality
of the True Set. However, even if the intersection
between the True Set and the Control Set is empty,
we are not completely sure that the Control Set does
not contains any pair where the entailment relation
holds. What we may assume is that this last set at
least contains a smaller number of positive pairs.
Sensitivity, i.e. the probability of having positive
answers for positive pairs, and specificity, i.e. the
probability of having negative answers for negative
pairs, are then defined as:
Sensitivity(t) = p((vh,vt) ∈ TS|S(vh,vt) > t)
Specificity(t) = p((vh,vt) ∈ CS|S(vh,vt) < t)
where p((vh,vt) ∈ TS|S(vh,vt) > t) is the prob-
ability of a candidate pair (vh,vt) to belong to TS
if the test is positive, i.e. the value S(vh,vt) of the
entailment detection measure is greater than t, while
p((vh,vt) ∈ CS|S(vh,vt) < t) is the probability
of belonging to CS if the test is negative. The ROC
curve (Sensitivity vs. 1 − Specificity) naturally
follows (see Fig. 1).
Results are encouraging as textual entailment pat-
terns show a positive correlation with the entailment
relation. Both ROC curves, the one related to the fre-
quency indicator Sf (f in figure) and the one related
to the mutual information SMI (MI in figure), are
above the Baseline curve. Moreover, both curves
are above the second baseline (Baseline2) applica-
ble when it is really possible to use the indicators. In
fact, textual entailment patterns have a non-zero fre-
quency only for 61.4% of the elements in the True
Set. This is true also for 48.1% of the elements in the
Control Set. The presence-absence in the corpus is
then already an indicator for the entailment relation
of verb pairs, but the application of the two indica-
tors can help in deciding among elements that have
a non-zero frequency in the corpus. Finally, in this
case, mutual information appears to be a better indi-
cator for the entailment relation with respect to the
frequency.
4 Conclusions
We have defined a method to recognise and extract
entailment relations between verb pairs based on
what we call textual entailment pattern. In this work
we defined a first kernel of textual entailment pat-
terns based on subject-verb relations. Potentials of
the method are still high as different kinds of textual
41
entailment patterns may be defined or discovered
investigating relations between sentences and sub-
sentences as done in (Lapata and Lascarides, 2004)
for temporal relations or between near sentences as
done in (Basili et al., 2003) for cause-effect relations
between domain events. Some interesting and sim-
ple inter-sentential patters are defined in (Chklovski
and Pantel, 2004). Moreover, with respect to anchor-
based approaches, the method we presented here
offers a different point of view on the problem of
acquiring textual entailment relation prototypes, as
textual entailment patterns do not depend on the rep-
etition of “similar” facts. This practically indepen-
dent view may open the possibility to experiment
co-training algorithms (Blum and Mitchell, 1998)
also in this area. Finally, the approach proposed can
be useful to define better probability estimations in
probabilistic entailment detection methods such as
the one described in (Glickman et al., 2005).

References
Roberto Basili, Maria Teresa Pazienza, and Fabio Mas-
simo Zanzotto. 2003. Inducing hyperlinking rules
from text collections. In Proceedings of the Interna-
tional Conference Recent Advances of Natural Lan-
guage Processing (RANLP-2003), Borovets, Bulgaria.
Avrim Blum and Tom Mitchell. 1998. Combin-
ing labeled and unlabeled data with co-training. In
COLT: Proceedings of the Conference on Computa-
tional Learning Theory. Morgan Kaufmann.
Peter F. Brown, Stephen A. Della Pietra, Vincent J.
Della Pietra, and Robert L. Mercer. 1991. Word-sense
disambiguation using statistical methods. In Proceed-
ings of the 29th Annual Meeting of the Association for
Computational Linguistics (ACL), Berkely, CA.
Timoty Chklovski and Patrick Pantel. 2004. VerbO-
CEAN: Mining the web for fine-grained semantic verb
relations. In Proceedings of the 2004 Conference on
Empirical Methods in Natural Language Processing,
Barcellona, Spain.
K.W. Church and P. Hanks. 1989. Word association
norms, mutual information and lexicography. In Pro-
ceedings of the 27th Annual Meeting of the Associa-
tion for Computational Linguistics (ACL), Vancouver,
Canada.
Beatrice Daille. 1994. Approche mixte pour l’extraction
de terminologie: statistque lexicale et filtres linguis-
tiques. Ph.D. thesis, C2V, TALANA, Universit`e Paris
VII.
F.J. Damerau. 1993. Evaluating domain-oriented multi-
word terms from text. Information Processing and
Management, 29(4):433–447.
R.M. Fano. 1961. Transmission of Information: a sta-
tistical theory of communications. MIT Press, Cam-
bridge,MA.
Oren Glickman and Ido Dagan. 2004. Probabilistic
textual entailment: Generic applied modeling of lan-
guage variability. In Proceedings of the Workshop on
Learning Methods for Text Understanding and Mining,
Grenoble, France.
Oren Glickman, Ido Dagan, and Moshe Koppel. 2005.
Web based probabilistic textual entailment. In Pro-
ceedings of the 1st Pascal Challenge Workshop,
Southampton, UK.
D.M. Green and J.A. Swets. 1996. Signal Detection The-
ory and Psychophysics. John Wiley and Sons, New
York, USA.
Marti A. Hearst. 1992. Automatic acquisition of hy-
ponyms from large text corpora. In Proceedings of
the 15th International Conference on Computational
Linguistics (CoLing-92), Nantes, France.
Frank Keller and Mirella Lapata. 2003. Using the web to
obtain frequencies for unseen bigrams. Computational
Linguistics, 29(3), September.
Mirella Lapata and Alex Lascarides. 2004. Inferring
sentence-internal temporal relations. In Proceedings
of the Human Language Technology Conference of the
North American Chapter of the Association for Com-
putational Linguistics, Boston, MA.
George A. Miller. 1995. WordNet: A lexical database for
English. Communications of the ACM, 38(11):39–41,
November.
G. Minnen, J. Carroll, and D. Pearce. 2001. Applied
morphological processing of english. Natural Lan-
guage Engineering, 7(3):207–223.
Emmanuel Morin. 1999. Extraction de liens
s´emantiques entre termes `a partir de corpus de textes
techniques. Ph.D. thesis, Univesit´e de Nantes, Facult´e
des Sciences et de Techniques.
Deepak Ravichandran and Eduard Hovy. 2002. Learning
surface text patterns for a question answering system.
In Proceedings of the 40th ACL Meeting, Philadelphia,
Pennsilvania.
Idan Szpektor, Hristo Tanev, Ido Dagan, and Bonaventura
Coppola. 2004. Scaling web-based acquisition of en-
tailment relations. In Proceedings of the 2004 Confer-
ence on Empirical Methods in Natural Language Pro-
cessing, Barcellona, Spain.
