Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language
Processing (HLT/EMNLP), pages 387–394, Vancouver, October 2005. c©2005 Association for Computational Linguistics
Robust Textual Inference via Graph Matching
Aria D. Haghighi
Dept. of Computer Science
Stanford University
Stanford, CA
aria42@stanford.edu
Andrew Y. Ng
Dept. of Computer Science
Stanford University
Stanford, CA
ang@cs.stanford.edu
Christopher D. Manning
Dept. of Computer Science
Stanford University
Stanford, CA
manning@cs.stanford.edu
Abstract
We present a system for deciding whether
a given sentence can be inferred from
text. Each sentence is represented as a
directed graph (extracted from a depen-
dency parser) in which the nodes repre-
sent words or phrases, and the links repre-
sent syntactic and semantic relationships.
We develop a learned graph matching ap-
proach to approximate entailment using
the amount of the sentence’s semantic
content which is contained in the text. We
present results on the Recognizing Textual
Entailment dataset (Dagan et al., 2005),
and show that our approach outperforms
Bag-Of-Words and TF-IDF models. In ad-
dition, we explore common sources of er-
rors in our approach and how to remedy
them.
1 Introduction
A fundamental stumbling block for several NLP ap-
plications is the lack of robust and accurate seman-
tic inference. For instance, question answering sys-
tems must be able to recognize, or infer, an answer
which may be expressed differently from the query.
Information extraction systems must also be able
recognize the variability of equivalent linguistic ex-
pressions. Document summarization systems must
generate succinct sentences which express the same
content as the original document. In Machine Trans-
lation evaluation, we must be able to recognize legit-
imate translations which structurally differ from our
reference translation.
One sub-task underlying these applications is the
ability to recognize semantic entailment; whether
one piece of text follows from another. In contrast
to recent work which has successfully utilized logic-
based abductive approaches to inference (Moldovan
et al., 2003; Raina et al., 2005b), we adopt a graph-
based representation of sentences, and use graph
matching approach to measure the semantic over-
lap of text. Graph matching techniques have proven
to be a useful approach for tractable approximate
matching in other domains including computer vi-
sion. In the domain of language, graphs provide
a natural way to express the dependencies between
words and phrases in a sentence. Furthermore,
graph matching also has the advantage of providing
a framework for structural matching of phrases that
would be difficult to resolve at the level of individual
words.
2 Task Definition and Data
We describe our approach in the context of the 2005
Recognizing Textual Entailment (RTE) Challenge
(Dagan et al., 2005), but note that our approach eas-
ily extends to other related inference tasks. The sys-
tem presented here was one component of our re-
search group’s 2005 RTE submission (Raina et al.,
2005a) which was the top-ranking system according
to one of the two evaluation metrics.
In the 2005 RTE domain, we are given a set of
pairs, each consisting of two parts: 1) the text, a
387
S
NP-Bezos
NNP
Bezos
VP-established
VBD
established
NP-company
DT
a
NN
company
Bezos
(person)
company
(organization)
establish
(VBD)
Subj (Agent) Obj (Patient)
Figure 1: An example parse tree and the correspond-
ing dependency graph. Each phrase of the parse tree
is annotated with its head word, and the parentheti-
cal edge labels in the dependency graph correspond
to semantic roles.
small passage,1 and the hypothesis, a single sen-
tence. Our task is to decide if the hypothesis is “en-
tailed” by the text. Here, “entails” does not mean
strict logical implication, but roughly means that
a competent speaker with basic world-knowledge
would be happy to conclude the hypothesis given the
text. This criterion has an aspect of relevance logic
as opposed to material implication: while various
additional background information may be needed
for the hypothesis to follow, the text must substan-
tially support the hypothesis.
Despite the informality of the criterion and the
fact that the available world knowledge is left
unspecified, human judges show extremely good
agreement on this task – 3 human judges indepen-
dent of the organizers calculated agreement rates
with the released data set ranging from 91–96% (Da-
gan et al., 2005). We believe that this in part reflects
that the task is fairly natural to human beings. For
a flavor of the nature (and difficulty) of the task, see
Table 1.
We give results on the data provided for the RTE
task which consists of 567 development pairs and
800 test pairs. In both sets the pairs are divided into
7 tasks – each containing roughly the same number
of entailed and not-entailed instances – which were
used as both motivation and means for obtaining and
constructing the data items. We will use the follow-
ing toy example to illustrate our representation and
matching technique:
Text: In 1994, Amazon.com was founded by Jeff Bezos.
Hypothesis: Bezos established a company.
1Usually a single sentence, but occasionally longer.
3 Semantic Representation
3.1 The Need for Dependencies
Perhaps the most common representation of text for
assessing content is “Bag-Of-Words” or “Bag-of-N-
Grams” (Papineni et al., 2002). However, such rep-
resentations lose syntactic information which can
be essential to determining entailment. Consider a
Question Answer system searching for an answer
to When was Israel established? A representation
which did not utilize syntax would probably enthusi-
astically return an answer from (the 2005 RTE text):
The National Institute for Psychobiology in Israel
was established in 1979.
In this example, it’s important to try to match rela-
tionships as well as words. In particular, any answer
to the question should preserve the dependency be-
tween Israel and established. However, in the pro-
posed answer, the expected dependency is missing
although all the words are present.
Our approach is to view sentences as graphs be-
tween words and phrases, where dependency rela-
tionships, as in (Lin and Pantel, 2001), are charac-
terized by the path between vertices.
Given this representation, we judge entailment by
measuring not only how many of the hypothesis ver-
tices are matched to the text but also how well the
relationships between vertices in the hypothesis are
preserved in their textual counterparts. For the re-
mainder of the section we outline how we produce
graphs from text, and in the next section we intro-
duce our graph matching model.
3.2 From Text To Graphs
Starting with raw English text, we use a version of
the parser described in (Klein and Manning, 2003),
to obtain a parse tree. Then, we derive a dependency
tree representation of the sentence using a slightly
modified version of Collins’ head propagation rules
(Collins, 1999), which make main verbs not auxil-
iaries the head of sentences. Edges in the depen-
dency graph are labeled by a set of hand-created
tgrep expressions. These labels represent “sur-
face” syntax relationships such as subj for subject
and amod for adjective modifier, similar to the rela-
tions in Minipar (Lin and Pantel, 2001). The depen-
dency graph is the basis for our graphical represen-
tation, but it is enhanced in the following ways:
388
Task Text Hypothesis Entailed
Question An-
swer (QA)
Prince Charles was previously married to Princess
Diana, who died in a car crash in Paris in August
1997.
Prince Charles and Princess Diana got
married in August 1997.
False
Machine
Translation
(MT)
Sultan Al-Shawi, a.k.a the Attorney, said during a
funeral held for the victims, ”They were all chil-
dren of Iraq killed during the savage bombing.”.
The Attorney, said at the funeral, ”They
were all Iraqis killed during the brutal
shelling.”.
True
Comparable
Documents
(CD)
Napster, which started as an unauthorized song-
swapping Web site, has transformed into a legal
service offering music downloads for a monthly
fee.
Napster illegally offers music down-
loads.
False
Paraphrase
Recognition
(PP)
Kerry hit Bush hard on his conduct on the war in
Iraq.
Kerry shot Bush. False
Information
Retrieval (IR)
The country’s largest private employer, Wal-Mart
Stores Inc., is being sued by a number of its female
employees who claim they were kept out of jobs in
management because they are women.
Wal-Mart sued for sexual discrimina-
tion.
True
Table 1: Some Textual Entailment examples. The last three demonstrate some of the harder instances.
1. Collapse Collocations and Named-Entities: We
“collapse” dependency nodes which represent
named entities (e.g., Jeff Bezos in Figure fig-
example) and also collocations listed in Word-
Net, including verbs and their adjacent particles
(e.g., blow off in He blew off his work) .
2. Dependency Folding: As in (Lin and Pan-
tel, 2001), we found it useful to fold cer-
tain dependencies (such as modifying preposi-
tions) so that modifiers became labels connect-
ing the modifier’s governor and dependent di-
rectly. For instance, in the text graph in Figure
2, we have changed in from a word into a rela-
tion between its head verb and the head of its
NP complement.
3. Semantic Role Labeling: We also augment
the graph representation with Probank-style
semantic roles via the system described in
(Toutanova et al., 2005). Each predicate adds
an arc labeled with the appropriate seman-
tic role to the head of the argument phrase.
This helps to create links between words which
share a deep semantic relation not evident in
the surface syntax. Additionally, modifying
phrases are labeled with their semantic types
(e.g., in 1991 is linked by a Temporal edge in
the text graph of Figure 2), which should be
useful in Question Answering tasks.
4. Coreference Links: Using a co-rereference res-
olution tagger, coreflinks are added through-
out the graph. These links allowed connecting
the referent entity to the vertices of the referring
vertex. In the case of multiple sentence texts, it
is our only “link” in the graph between entities
in the two sentences.
For the remainder of the paper, we will refer to
the text as T and hypothesis as H, and will speak
of them in graph terminology. In addition we will
use HV and HE to denote the vertices and edges,
respectively, of H.
4 Entailment by Graph Matching
We take the view that a hypothesis is entailed from
the text when the cost of matching the hypothesis
graph to the text graph is low. For the remainder of
this section, we outline a general model for assign-
ing a match cost to graphs.
For hypothesis graph H, and text graph T, a
matching M is a mapping from the vertices of H to
those of T. For vertex v in H, we will use M(v) to
denote its “match” in T. As is common in statistical
machine translation, we allow nodes in H to map to
fictitious NULL vertices in T if necessary. Suppose
the cost of matching M is Cost(M). If M is the set
of such matchings, we define the cost of matching
H to T to be
MatchCost(H,T) = minM∈M Cost(M) (1)
Suppose we have a model, VertexSub(v,M(v)),
which gives us a cost in [0,1], for substituting ver-
tex v in H for M(v) in T. One natural cost model
389
is to use the normalized cost for each of the vertex
substitutions in M:
VertexCost(M) = 1Z summationdisplay
v∈HV
w(v)VertexSub(v,M(v))
(2)
Here, w(v) represents the weight or relative im-
portance for vertex v, and Z = summationtextv∈HV w(v) is
a normalization constant. In our implementation,
the weight of each vertex was based on the part-of-
speech tag of the word or the type of named entity,
if applicable. However, there are several other pos-
sibilities including using TF-IDF weights for words
and phrases.
Notice that when Cost(M) takes the form of
(2), computing MatchCost(H,T) is equivalent to
finding the minimal cost bipartite graph-matching,
which can be efficiently computed using linear pro-
gramming.
We would like our cost-model to incorporate
some measure of how relationships in H are pre-
served in T under M. Ideally, a matching should
preserve all local relationships; i.e, if v → v′ ∈ HE,
then M(v) → M(v′) ∈ TE. When this condition
holds for all edges in H, H is isomorphic to a sub-
graph of T.
What we would like is an approximate notion of
isomorphism, where we penalize the distortion of
each edge relation in H. Consider an edge e =
(v,v′) ∈ HE, and let φM(e) be the path from M(v)
to M(v′) in T.
Again, suppose we have a model,
PathSub(e,φM(e)) for assessing the “cost” of
substituting a direct relation e ∈ HE for its coun-
terpart, φM(e), under the matching. This leads to
a formulation similar to (2), where we consider the
normalized cost of substituting each edge relation
in H with a path in T:
RelationCost(M) = 1Z summationdisplay
e∈HE
w(e)PathSub(e,φM(e))
(3)
where Z = summationtexte∈HE w(e) is a normalization con-
stant. As in the vertex case, we have weights
for each hypothesis edge, w(e), based upon the
edge’s label; typically subject and object relations
are more important to match than others. Our fi-
nal matching cost is given by a convex mixture of
Subj (Agent)
establish
(VBD)
Bezos
(person)
Company
(organization)
Obj (Patient) 
Subj (Agent)
found
(VBD)
Jeff Bezos
(person)
Amazon.com
(organization)
Obj (Patient)
In (Temporal)
1991
(date)
Synonym 
Match
Cost: 0.4
Hyponym
Match
Cost: 0.0
Exact
Match
Cost: 0.0
Vertex Cost: (0.0 + 0.2 + 0.4)/3 = 0.2
Relation Cost: 0  (Graphs Isomorphic)  
Match Cost: 0.55 (0.2) + (.45) 0.0 = 0.11
Figure 2: Example graph matching (α = 0.55) for
example pair. Dashed lines represent optimal match-
ing.
the vertex and relational match costs: Cost(M) =
αVertexCost(M) + (1 −α)RelationCost(M).
Notice that minimizing Cost(M) is computa-
tionally hard since if our PathSub model as-
signs zero cost only for preserving edges, then
RelationCost(M) = 0 if and only if H is isomorphic
to a subgraph of T. Since subgraph isomophism is
an NP-complete problem, we cannot hope to have an
efficient exact procedure for minimizing the graph
matching cost. As an approximation, we can ef-
ficiently find the matching M∗ which minimizes
VertexCost(·); we then perform local greedy hill-
climbing search, beginning from M∗, to approxi-
mate the minimal matching. The allowed operations
are changing the assignment of any hypothesis ver-
tex to a text one, and, to avoid ridges, swapping two
hypothesis assignments
5 Node and Edge Substitution Models
In the previous section we described our graph
matching model in terms of our VertexSub model,
which gives a cost for substituting one graph vertex
for another, and PathSub, which gives a cost for sub-
stituting the path relationship between two paths in
one graph for that in another. We now outline these
models.
5.1 Vertex substitution cost model
Our VertexSub(v,M(v)) model is based upon a
sliding scale, where progressively higher costs are
390
given based upon the following conditions:
• Exact Match: v and M(v) are identical words/
phrases.
• Stem Match: v and M(v)’s stems match or one
is a derivational form of the other; e.g., matching
coaches to coach.
• Synonym Match: v and M(v) are synonyms ac-
cording to WordNet (Fellbaum, 1998). In particu-
lar we use the top 3 senses of both words to deter-
mine synsets.
• Hypernym Match: v is a “kind of” M(v), as
determined by WordNet. Note that this feature is
asymmetric.
• WordNet Similarity: v and M(v) are similar ac-
cording to WordNet::Similarity (Peder-
sen et al., 2004). In particular, we use the measure
described in (Resnik, 1995). We found it useful
to only use similarities above a fixed threshold to
ensure precision.
• LSA Match: v and M(v) are distributionally
similar according to a freely available Latent Se-
mantic Indexing package,2 or for verbs similar
according to VerbOcean (Chklovski and Pantel,
2004).
• POS Match: v and M(v) have the same part of
speech.
• No Match: M(v) is NULL.
Although the above conditions often produce rea-
sonable matchings between text and hypothesis, we
found the recall of these lexical resources to be far
from adequate. More robust lexical resources would
almost certainly boost performance.
5.2 Path substitution cost model
Our PathSub(v → v′,M(v) → M(v′)) model is
also based upon a sliding scale cost based upon the
following conditions:
• Exact Match: M(v) → M(v′) is an en edge in
T with the same label.
• Partial Match: M(v) → M(v′) is an en edge in
T, not necessarily with the same label.
• Ancestor Match: M(v) is an ancestor of M(v′).
We use an exponentially increasing cost for longer
distance relationships.
2Available at http://infomap.stanford.edu
• Kinked Match: M(v) and M(v′) share a com-
mon parent or ancestor in T. We use an exponen-
tially increasing cost based on the maximum of
the node’s distances to their least common ances-
tor in T.
These conditions capture many of the common
ways in which relationships between entities are dis-
torted in semantically related sentences. For in-
stance, in our system, a partial match will occur
whenever an edge type differs in detail, for instance
use of the preposition towards in one case and to in
the other. An ancestor match will occur whenever an
indirect relation leads to the insertion of an interven-
ing node in the dependency graph, such as matching
John is studying French farming vs. John is studying
French farming practices.
5.3 Learning Weights
Is it possible to learn weights for the relative impor-
tance of the conditions in the VertexSub and PathSub
models? Consider the case where match costs are
given only by equation (2) and vertices are weighted
uniformly (w(v) = 1). Suppose that Φ(v,M(v))
is a vector of features3 indicating the cost accord-
ing to each of the conditions listed for matching v
to M(v). Also let w be weights for each element
of Φ(v,M(v)). First we can model the substitution
cost for a given matching as:
VertexSub(v,M(v)) = exp(w
TΦ(v,M(v)))
1 + exp(wTΦ(v,M(v)))
Letting s(·) be the 1-sigmoid function used in the
right hand side of the equation above, our final
matching cost as a function of w is given by
c(H,T;w) = min
M∈M
1
|HV|
summationdisplay
v∈H
s(wTΦ(v,M(v)))
(4)
Suppose we have a set of text/hypothesis pairs,
{(T(1),H(1)),...,(T(n),H(n))}, with labels y(i)
which are 1 if H(i) is entailed by T(i) and 0
otherwise. Then we would like to choose w to
minimize costs for entailed examples and maximize
it for non-entailed pairs:
3In the case of our “match” conditions, these features will
be binary.
391
ℓ(w) = summationdisplay
i:y(i)=1
logc(H(i),T(i);w) +
summationdisplay
i:y(i)=0
log(1 −c(H(i),T(i);w))
Unfortunately, ℓ(w) is not a convex function. No-
tice that the cost of each matching, M, implicitly
depends on the current setting of the weights w. It
can be shown that since each c(H,T;w) involves
minimizing M ∈ M, which depends on w, it is not
convex. Therefore, we can’t hope to globally opti-
mize our cost functions over w and must settle for
an approximation.
One approach is to use coordinate ascent over M
and w. Suppose that we begin with arbitrary weights
and given these weights choose M(i) to minimize
each c(H(i),T(i);w). Then we use a relaxed form of
the cost function where we use the matchings found
in the last step:
ˆc(H(i),T(i);w) = 1|H
V|
summationdisplay
v∈H
s(wTΦ(v,M(i)(v)))
Then we maximize w with respect to ℓ(w) with
each c(·) replaced with the cost-function ˆc(·). This
step involves only logistic regression. We repeat this
procedure until our weights converge.
To test the effectiveness of the above procedure
we compared performance against baseline settings
using a random split on the development set. Picking
each weight uniformly at random resulted in 53%
accuracy. Setting all weights identically to an arbi-
trary value gave 54%. The procedure above, where
the weights are initialized to the same value, resulted
in an accuracy of 57%. However, we believe there
is still room for improvement since carefully-hand
chosen weights results in comparable performance
to the learned weights on the final test set. We be-
lieve this setting of learning under matchings is a
rather general one and could be beneficial to other
domains such as Machine Translation. In the future,
we hope to find better approximation techniques for
this problem.
6 Checks
One systematic source of error coming from our ba-
sic approach is the implicit assumption of upwards
monotonicity of entailment; i.e., if T entails H then
adding more words to T should also give us a sen-
tence which entails H. This assumption, also made
by other recent abductive approaches (Moldovan et
al., 2003), does not hold for several classes of exam-
ples. Our formalism does not at present provide a
general solution to this issue, but we include special
case handling of the most common types of cases,
which we outline below.4 These checks are done af-
ter graph matching and assume we have stored the
minimal cost matching.
Negation Check
Text: Clinton’s book is not a bestseller
Hypothesis: Clinton’s book is a bestseller
To catch such examples, we check that each hy-
pothesis verb is not matched to a text word which
is negated (unless the verb pairs are antonyms) and
vice versa. In this instance, the is in H, denoted by
isH, is matched to isT which has a negation modifier,
notT , absent for isH. So the negation check fails.
Factive Check
Text: Clonaid claims to have cloned 13 babies worldwide.
Hypothesis: Clonaid has cloned 13 babies.
Non-factive verbs (claim, think, charged, etc.) in
contrast to factive verbs (know, regret, etc.) have
sentential complements which do not represent true
propositions. We detect such cases, by checking that
each verb in H that is matched in T does not have a
non-factive verb for a parent.
Superlative Check
Text: The Osaka World Trade Center is the tallest building in
Western Japan.
Hypothesis: The Osaka World Trade Center is the tallest build-
ing in Japan.
In general, superlative modifiers (most, biggest,
etc.) invert the typical monotonicity of entailment
and must be handled as special cases. For any
noun n with a superlative modifier (part-of-speech
JJS) in H, we must ensure that all modifier relations
of M(n) are preserved in H. In this example, build-
ingH has a superlative modifier tallestH, so we must
ensure that each modifier relation of JapanT , a noun
4All the examples are actual, or slightly altered, RTE exam-
ples.
392
Method Accuracy CWS
Random 50.0% 0.500
Bag-Of-Words 49.5% 0.548
TF-IDF 51.8% 0.560
GM-General 56.8% 0.614
GM-ByTask 56.7% 0.620
Table 2: Accuracy and confidence weighted score
(CWS) for test set using various techniques.
dependent of buildingT , has a WesternT modifier not
in H. So its fails the superlative check.
Additionally, during error analysis on the devel-
opment set, we spotted the following cases where
our VertexSub function erroneously labeled vertices
as similar, and required special case consideration:
• Antonym Check: We consistently found that the
WordNet::Similarity modules gave high-
similarity to antonyms.5 We explicitly check
whether a matching involved antonyms and reject
unless one of the vertices had a negation modifier.
• Numeric Mismatch: Since numeric expressions
typically have the same part-of-speech tag (CD),
they were typically matched when exact matches
could not be found. However, mismatching nu-
merical tokens usually indicated that H was not
entailed, and so pairs with a numerical mismatch
were rejected.
7 Experiments and Results
For our experiments we used the devolpement and
test sets from the Recognizing Textual Entailment
challenge (Dagan et al., 2005). We give results for
our system as well as for the following systems:
• Bag-Of-Words: We tokenize the text and hypoth-
esis and strip the function words, and stem the re-
sulting words. The cost is given by the fraction of
the hypothesis not matched in the text.
• TF-IDF: Similar to Bag-Of-Words except that
there is a tf.idf weight associated with each hy-
pothesis word so that more “important” words are
higher weight for matching.
5This isn’t necessarily incorrect, but is simply not suitable
for textual inference.
Task GM-General GM-ByTask
Accuracy CWS Accuracy CWS
CD 72.0% 0.742 76.0% 0.771
IE 55.9% 0.583 55.8% 0.595
IR 52.2% 0.564 51.1% 0.572
MT 50.0% 0.497 43.3% 0.489
PP 58.0% 0.741 58.0% 0.746
QA 53.8% 0.537 55.4% 0.556
RC 52.1% 0.539 52.9% 0.523
Table 3: Accuracy and confidence weighted score
(CWS) split by task on the RTE test set.
We also present results for two graph matching
(GM) systems. The GM-General system fits a sin-
gle global threshold from the development set. The
GM-ByTask system fits a different threshold for
each of the tasks.
Our results are summarized in Table 2. As the re-
sult indicates, the task is particularly hard; all RTE
participants scored between 50% and 60% in terms
of overall accuracy (Dagan et al., 2005). Nevever-
theless, both GM systems perform better than either
Bag-Of-Words or TF-IDF. CWS refers to Confi-
dence Weighted Score (also known as average pre-
cision). This measure is perhaps a more insightful
measure, since it allows the inclusion of a ranking
of answers by confidence and assesses whether you
are correct on the pairs that you are most confident
that you know the answer to. To assess CWS, our
n answers are sorted in decreasing order by the con-
fidence we return, and then for each i, we calculate
ai, our accuracy on our i most confident predictions.
Then CWS = 1nsummationtextni=1 ai.
We also present results on a per-task basis in Ta-
ble 3. Interestingly, there is a large variation in per-
formance depending on the task.
8 Conclusion
We have presented a learned graph matching ap-
proach to approximating textual entailment which
outperforms models which only match at the word
level, and is competitive with recent weighed ab-
duction models (Moldovan et al., 2003). In addition,
we explore problematic cases of nonmonotonicity in
entailment, which are not naturally handled by ei-
ther subgraph matching or the so-called “logic form”
393
Text Hypothesis True Ans. Our Ans. Conf Comments
A Filipino hostage in Iraq was re-
leased.
A Filipino hostage
was freed in Iraq.
True True 0.84 Verb rewrite is handled.
Phrasal ordering does not
affect cost.
The government announced last
week that it plans to raise oil
prices.
Oil prices drop. False False 0.95 High cost given for substituting
word for its antonym.
Shrek 2 rang up $92 million. Shrek 2 earned $92
million.
True False 0.59 Collocation “rang up” is
not known to be similar to
“earned”.
Sonia Gandhi can be defeated in
the next elections in India by BJP.
Sonia Gandhi is de-
feated by BJP.
False True 0.77 “can be” does not indicate the
complement event occurs.
Fighters loyal to Moqtada al-Sadr
shot down a U.S. helicopter Thurs-
day in the holy city of Najaf.
Fighters loyal to
Moqtada al-Sadr
shot down Najaf.
False True 0.67 Should recognize non-Location
cannot be substituted for Loca-
tion.
C and D Technologies announced
that it has closed the acquisition of
Datel, Inc.
Datel Acquired C
and D technologies.
False True 0.64 Failed to penalize switch in se-
mantic role structure enough
Table 4: Analysis of results on some RTE examples along with out guesses and confidence probabilities
inference of (Moldovan et al., 2003) and have pro-
posed a way to capture common cases of this phe-
nomenon. We believe that the methods employed
in this work show much potential for improving the
state-of-the-art in computational semantic inference.
9 Acknowledgments
Many thanks to Rajat Raina, Christopher Cox,
Kristina Toutanova, Jenny Finkel, Marie-Catherine
de Marneffe, and Bill MacCartney for providing us
with linguistic modules and useful discussions. This
work was supported by the Advanced Research and
Development Activity (ARDA)’s Advanced Ques-
tion Answering for Intelligence (AQUAINT) pro-
gram.
References
Timothy Chklovski and Patrick Pantel. 2004. VerbO-
cean: Mining the web for fine-grained semantic verb
relations. In EMNLP.
Michael Collins. 1999. Head-driven statistical models
for natural language parsing. Ph.D. thesis, University
of Pennsylvania.
Ido Dagan, Oren Glickman, and Bernardo Magnini.
2005. The PASCAL recognizing textual entailment
challenge. In Proceedings of the PASCAL Challenges
Workshop Recognizing Textual Entailment.
C. Fellbaum. 1998. WordNet: An Electronic Lexical
Database. MIT Press.
Dan Klein and Christopher D. Manning. 2003. Accurate
unlexicalized parsing. In ACL, pages 423–430.
Dekang Lin and Patrick Pantel. 2001. DIRT - discovery
of inference rules from text. In Knowledge Discovery
and Data Mining, pages 323–328.
Dan I. Moldovan, Christine Clark, Sanda M. Harabagiu,
and Steven J. Maiorano. 2003. Cogex: A logic prover
for question answering. In HLT-NAACL.
K. Papineni, S. Roukos, T. Ward, and W. Zhu. 2002.
Bleu: a method for automatic evaluation of machine
translation. In ACL.
Ted Pedersen, Siddharth Parwardhan, and Jason Miche-
lizzi. 2004. Wordnet::similarity – measuring the relat-
edness of concepts. In AAAI.
Rajat Raina, Aria Haghighi, Christopher Cox, Jenny
Finkel, Jeff Michels, Kristina Toutanova, Bill Mac-
Cartney, Marie-Catherine de Marneffe, Christopher D.
Manning, and Andrew Y. Ng. 2005a. Robust textual
inference using diverse knowledge sources. In Pro-
ceedings of the First PASCAL Challenges Workshop.
Southampton, UK.
Rajat Raina, Andrew Y. Ng, and Christopher D. Man-
ning. 2005b. Robust textual inference via learning and
abductive reasoning. In Proceedings of AAAI 2005.
AAAI Press.
Philip Resnik. 1995. Using information content to evalu-
ate semantic similarity in a taxonomy. In IJCAI, pages
448–453.
Kristina Toutanova, Aria Haghighi, and Cristiopher Man-
ning. 2005. Joint learning improves semantic role la-
beling. In Association of Computational Linguistics
(ACL).
394
