Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pages 57–64,
New York, June 2006. c©2006 Association for Computational Linguistics
Acquiring Inference Rules with TemporalConstraints
by Using Japanese Coordinated Sentences and Noun-Verb Co-occurrences
Kentaro Torisawa
Japan Advanced Institute of Science and Technology
1-1Asahidai, Nomi-shi, Ishikawa-ken, 923-1211JAPAN
torisawa@jaist.ac.jp
Abstract
This paper shows that inference rules with
temporal constraints can be acquired by us-
ing verb-verb co-occurrences in Japanese
coordinated sentences and verb-noun co-
occurrences. For example, our unsuper-
vised acquisition method could obtain the
inference rule �If someone enforces a law,
usually someone enacts the law at the same
time as or before the enforcing of the
law� since the verbs �enact� and �enforce�
frequently co-occurred in coordinated sen-
tences and the verbs also frequently co-
occurred with the noun �law�. We also
show that the accuracy of the acquisition
is improved by using the occurrence fre-
quency of a single verb, which we assume
indicates how generic the meaning of the
verb is.
1 Introduction
Our goal is to develop an unsupervised method for
acquiring inference rules that describe logical impli-
cations between event occurrences. As clues to �nd
the rules, we chose Japanese coordinated sentences,
which typically report two events that occur in a cer-
tain temporal order. Of course, not every coordi-
natedsentence necessarily expresses implications. We
found, though, that reliable rules can be acquired by
looking at co-occurrence frequencies between verbs
in coordinated sentences and co-occurrences between
verbs and nouns. For example, our method could ob-
taintherule�Ifsomeoneenforcesalaw,usuallysome-
one enacts the law at the same time as or before the
enforcing of the law�. In our experiments, when our
method produced 400 rules for 1,000 given nouns,
70% of the rules were considered proper by at least
three of fourhuman judges.
Note that the acquired inference rules pose tempo-
ral constraints on occurrences of the events described
in the rules. In the �enacting-and-enforcing-law� ex-
ample, the constraints were expressed by the phrase
�at the same time as or beforethe event of�. We think
suchtemporally constrained rulesshould bebene�cial
invarioustypesofNLPapplications. Therulesshould
allow Q&A systems to guess or restrict the time at
which a certain event occurs even if they cannot di-
rectly �nd the time in given documents. In addition,
we foundthat a large part of the acquired rules can be
regarded as paraphrases, and many possible applica-
tionsofparaphrases shouldalsobetarget applications.
To acquire rules, our method uses a score, which is
basically an approximation ofthe probability that par-
ticular coordinated sentences will be observed. How-
ever, it is weighted by a bias, which embodies our as-
sumption that frequently observed verbs are likely to
appear as the consequence of a proper inference rule.
This is based on our intuition that frequently appear-
ingverbshave a generic meaningandtendto describe
a wide range of situations, and that natural language
expressions referring to a wide range of situations are
more likely to be a consequence of a proper rule than
speci�c expressions describing only a narrowrangeof
events. A similar idea relying on word co-occurrence
was proposed by Geffet and Dagan (Geffet and Da-
gan, 2005)but ourmethodis simpler and we expect it
to be applicable to a wider range of vocabularies.
Research on the automatic acquisition of inference
rules, paraphrases and entailments has received much
attention. Previous attempts have used, for instance,
the similarities between case frames (Lin and Pan-
57
tel, 2001), anchor words (Barzilay and Lee, 2003;
Shinyama et al., 2002; Szepektor et al., 2004), and a
web-based method(Szepektor et al., 2004;Geffet and
Dagan, 2005). There is also a workshop devoted to
thistask (Daganetal.,2005). Theobtained accuracies
have still been low, however, and we think searching
for other clues, such as coordinated sentences and the
bias wehave just mentioned, isnecessary. Inaddition,
research has also been done on the acquisition of the
temporal relations (Fujiki et al., 2003; Chklovski and
Pantel, 2004) by using coordinated sentences as we
did, but these worksdid not consider the implications
between events.
2 Algorithmwitha Simpli�ed Score
In the following, we begin by providing an overview
of our algorithm. We specify the basic steps in the al-
gorithm and the form of the rules to be acquired. We
also examine the direction of implications andtempo-
ral ordering described by the rules. After that, we de-
scribe a simpli�ed version of the scoring function that
ouralgorithm uses and then discuss a problem related
to it. Thebias mechanism, which wementioned in the
introduction, is described in the section after that.
2.1 Procedure andGenerated Inference Rules
Our algorithm is given a noun as its input and pro-
duces a set of inference rules. A produced rule ex-
presses an implication relation between two descrip-
tions including the noun. Our basic assumptions for
the acquisition can be stated as follows.
• If verbs v1 and v2 frequently co-occur in coordi-
natedsentences, theverbsrefer totwoevents that
actually frequently co-occur in the real world,
andasentence including v1 andanother sentence
including v2 are good candidates to be descrip-
tions that have an implication relation and a par-
ticular temporal order between them.
• The above tendency becomes stronger when the
verbs frequently co-occur with a given noun n;
i.e., if v1 and v2 frequently co-occur in coordi-
nated sentences andthe verbs also frequently co-
occur with a nounn, a sentence including v1 and
n and another sentence including v2 and n are
good candidates to be descriptions that have an
implication relation between them.
Our procedure consists of the following steps.
Step1 Select M verbs that take a given noun n as
their argument most frequently.
Step2 For each possible pair of the selected verbs,
compute the value of a scoring function that em-
bodies our assumptions, and select the N verb
pairs that have the largest score values. Note
thatweexcludethecombinationofthesameverb
fromthe pairs to be considered.
Step3 If the score value fora verb pair is higher than
athreshold θ andtheverbstake nastheir syntac-
tic objects, generate an inference rule from the
verb pair and the noun.
Note that we used 500 as the value of M. N was set
to 4 and θ was set to various values during our ex-
periments. Another important point is that, in Step 3,
the argument positions at which the given noun can
appear is restricted to syntactic objects. This was be-
cause we empirically found that the rules generated
fromsuch verb-nounpairs were relatively accurate.
Assume that a given noun is �goods� and the verb
pair �sell� and �manufacture� is selected in Step 3.
Then, the following rule is generated.
• If someone sells goods, usually someone manu-
factures the goods at the same time as or before
the event of the selling of the goods.
Although the word �someone� occurs twice, we do
not demand that it refers to the same person in both
instances. It just works as a placeholder. Also note
that the adverb �usually�1 was inserted to prevent the
rule frombeingregarded as invalid byconsidering sit-
uationsthat arelogically possible butunlikely inprac-
tice.
The above rule is produced when �manufacture�
and �sell� frequently co-occur in coordinated sen-
tences such as �The company manufactured goods
and it sold them�. One might be puzzled because the
order of the occurrences of the verbs in the coordi-
nated sentences isreversed in therule. Theverb�sell�
in the second(embedded) sentence/clause in the coor-
dinated sentence appears as a verb in the precondition
of the rule, while �manufacture� in the �rst (embed-
ded) sentence/clause is the verb in the consequence.
A question then, is why we chose such an order,
or such a direction of implication. There is another
possibility, which might seem more straightforward.
From the same coordinated sentences, we could pro-
duce the rule where the direction is reversed; i.e,., �If
someone manufactures goods, usually someone sells
1We used �futsuu� as a Japanese translation.
58
the goodsat the same time as or after the manufactur-
ing�. The difference is that the rules generated by our
procedure basically infer a past event from another
event, while the rules with the opposite direction have
to predict a future event. In experiments using ourde-
velopment set, we observed that the rules predicting
future events were often unacceptable because of the
uncertainty that weusually encounter inpredictingthe
future or achieving a future goal. For instance, peo-
ple might do something (e.g., manufacturing) with an
intention to achieve some other goal (e.g., selling) in
thefuture. Butthey sometimes fail to achieve their fu-
ture goal for some reason. Some manufactured goods
are never sold because, forinstance, they are notgood
enough. In our experiments, we found that the preci-
sion rates of the rules with the direction we adopted
were much higher than those of the rules with the op-
posite direction.
2.2 Simpli�edScoring Function
To be precise, a rule generated by our method has the
following form, where vpre and vcon are verbs and n
is a given noun.
• If someonevpre n, usually someonevcon thenat
the same time as or before the vpre-ing of the n.
We assume that all three occurrences of nounn in the
rule refer to the same entity.
Now, we de�ne a simpli�ed version of our scoring
function as follows.
BasicS(n,vcon,vpre,arg,argprime) =
Pcoord(vcon,vpre)Pargprime(n|vpre)Parg(n|vcon)/P(n)2
Here, Pcoord(vcon,vpre) is the probability that vcon
and vpre are observed in coordinated sentences in a
way that the event described by vcon temporally pre-
cedes or occurs at the same time as the event de-
scribed by vpre. (More precisely, vcon and vpre must
be the main verbs of two conjuncts S1 and S2 in a
Japanese coordinated sentence that is literally trans-
lated to the form �S1 and S2�.) This means that in
the coordinated sentences, vcon appears �rst and vpre
second. Pargprime(n|vpre)andParg(n|vcon)arethecondi-
tional probabilities that noccupies the argument posi-
tionsargprime ofvpre andarg ofvcon, respectively. At the
beginning, as possible argument positions, we speci-
�ed �ve argument positions, including the syntactic
object and the subject. Note that when vpre and vcon
frequently co-occur in coordinated sentences and n
often becomes arguments of vpre and vcon, the score
has a large value. This means that the score embodies
our assumptions for acquiring rules.
Theterm Pcoord(vcon,vpre)Pargprime(n|vpre)Parg(n|vcon) in
BasicS is actually an approximation of the proba-
bility P(vpre,argprime,n,vcon,arg,n) that we will ob-
serve the coordinated sentences such that the two sen-
tences/clauses in the coordinated sentence are headed
by vpre and vcon and n occupies the argument posi-
tions argprime of vpre and arg of vcon. Another important
point is that the score is divided byP(n)2. This is be-
cause theprobabilities such asParg(n|vcon)tendtobe
large for a frequently observed noun n. The division
by P(n)2 is done to cancel such a tendency. This di-
vision does not affect the ranking for the same noun,
but, since we give a uniform threshold for selecting
the verb pairs for distinct nouns, such normalization
isdesirable, aswecon�rmed inexperiments usingour
development set.
2.3 Paraphrases andCoordinated Sentences
Thus, we have de�ned our algorithm and a simpli�ed
scoring function. Now let us discuss a problem that is
caused by the scoring function.
As mentioned in the introduction, a large por-
tion of the acquired rules actually consists of para-
phrases. Here, by a paraphrase, we mean a rule con-
sisting of two descriptions referring to an identical
event. The following example is an English transla-
tion of such paraphrases obtained by our method. We
think this rule is acceptable. Note that we invented a
new English verb �clearly-write� as a translation of a
Japanese verbmeiki-suruwhile �write� is a trans-
lation of another Japanese verb kaku.
• If someone clearly-writes a phone number, usu-
ally someone writes the phone number at the
same time as or before the clearly-writing of the
phonenumber.
Note that �clearly-write� and �write� have almost the
same meaning but the former is often used in texts
related to legal matters. Evidently, in the above rule,
�clearly-write� and �write� describe the same event,
and it can be seen as a paraphrase. There are two
types ofcoordinated sentence that ourmethodcan use
as clues to generate the rule.
• He clearly-wrote a phone number and wrote the
phonenumber.
• Heclearly-wrote aphonenumber, andalsowrote
an address.
The �rst sentence is more similar to the inference
rule than the second in the sense that the two verbs
59
share the same object. However, it is ridiculous be-
cause it describes the same event twice. Such a sen-
tence is not observed frequently in corpora, and will
not be used as clues to generate rules in practice.
On the other hand, we frequently observe sen-
tences of the second type in corpora, and our method
generates the paraphrases from the verb-verb co-
occurrences taken from such sentences. However,
there is a mismatch between the sentence and the ac-
quired rule in the sense that the rule describes two
events related to the same object (i.e., a phone num-
ber), while the above sentence describes two events
that are related to distinct objects (i.e., a phone num-
ber and an address). Regarding this mismatch, two
questions need to be addressed.
The �rst question is why our method can acquire
the rule despite the mismatch. The answer is that
ourmethodobtains theverb-verb co-occurrence prob-
abilities (Pcoord(vcon,vpre)) and the verb-noun co-
occurrence probabilities (e.g.,Parg(n|vcon)) indepen-
dently, and that the method does not check whether
the two verbs share an argument.
Then the next question is why our method can
acquire accurate paraphrases from such coordinated
sentences. Though we do not have a de�nite answer
now, ourhypothesis is related to the strategy that peo-
ple adoptin writing coordinated sentences. When two
similar but distinct events, which can be described by
the same verb,occur successively or at the same time,
people avoid repeating the same verb to describe the
two events in a single sentence. Instead they try to
use distinct verbs that have similar meanings. Sup-
pose that a person wrote his name and address. To
report what she did, she may write �I clearly-wrote
my name and also wrote my address� but will seldom
write�Iclearly-wrote mynameandalsoclearly-wrote
my address�. Thus, we can expect to be able to �nd
in coordinated sentences a large number of verb pairs
consisting of two verbs with similar meanings. Note
that our method tends to produce two verbs that fre-
quently co-occur withagiven noun. Thisalso helpsto
produce the inference rules consisting of two seman-
tically similar verbs.
3 BiasMechanism
We now describe a bias used in our full scoring func-
tion, which signi�cantly improves the precision. The
full scoring function is de�ned as
Score(n,vcon,vpre,arg,argprime) =
Parg(vcon)BasicS(n,vcon,vpre,arg,argprime).
The bias is denoted as Parg(vcon), which is the prob-
ability that we can observe the verb vcon, which is the
verb in the consequence of the rule, and its argument
position arg is occupied by a noun, no matter which
nounactually occupies the position.
An intuitive explanation of the assumption behind
this bias is that as the situation within which the de-
scriptionoftheconsequence inaruleisvalid becomes
wider, the rule becomes more likely to be a proper
one. Consider the following rules.
• If someone demands a compensation payment,
someone orders the compensation payment.
• If someone demands a compensation payment,
someone requests the compensation payment.
Weconsider the�rst rule to beunacceptable while the
secondexpresses aproper implication. Thedifference
is the situations in which the descriptions in the con-
sequences hold. In our view, the situations described
by �order� are more speci�c than those referred to by
�request�. In other words, �order� holds in a smaller
range of situations than �request�. Requesting some-
thing can happen in any situations where there exists
someone who can demand something, but ordering
can occur only ina situationswheresomeonein apar-
ticular social positioncandemandsomething. Theba-
sic assumption behind our bias is that rules with con-
sequences that can be valid in a wider range of situa-
tions, such as �requesting a compensation payment,�
are more likely to be proper ones than the rules with
consequences that hold in a smaller range of situa-
tions, such as �ordering a compensation payment�.
Thebias Parg(vcon)wasintroduced tocapture vari-
ationsofthesituations in which event descriptions are
valid. Weassume that frequently observed verbsform
generic descriptions that can be valid within a wide
range of events, while less frequent verbs tend to de-
scribe events that can occur in a narrower rangeof sit-
uations and form more speci�c descriptions than the
frequently observed verbs. Regarding the �request-
order� example, (a Japanese translation of) �request�
is observed more frequently than (a Japanese transla-
tionof)�order� incorporaandthis observation iscon-
sistent with our assumption. A similar idea by Geffet
and Dagan (Geffet and Dagan, 2005) was proposed
forcapturing lexical entailment. Thedifference is that
they relied on word co-occurrences rather than the
frequency of words to measure the speci�city of the
semantic contents of lexical descriptions, and needed
Web search to avoid data sparseness in co-occurrence
60
statistics. On the other hand, our method needs only
simpleoccurrence probabilities ofsingleverbsandwe
expect our method to be applicable to wider vocabu-
lary than Geffet and Dagan�s method.
The following is a more mathematical justi�cation
for the bias. According to the following discussion,
Parg(vcon) can be seen as a metric indicating how
easily we can establish an interpretation of the rule,
which is formalized as a mapping between events. In
our view, if we can establish the mapping easily, the
rule tendstobeacceptable. Thediscussion starts from
a formalization of an interpretation of an inference
rule. Consider the rule �If exp1 occurs, usually exp2
occurs at the same time or before the occurrence of
exp1�, where exp1 and exp2 are natural language ex-
pressions referring to events. In the following, we call
such expressions event descriptions and distinguish
them from an actual event referred to by the expres-
sions. An actual event is called an event instance.A possible interpretation of the rule is that, for any
event instance e1 that can be described by the event
description exp1 in the precondition of the rule, there
always exists an event instance e2 that can be de-
scribed by the event description exp2 in the conse-
quence and that occurs at the same time as or before
e1 occurs. Let us write e : exp if event instance e
can be described by event description exp. The above
interpretation can then be represented by the formula
Φ : ∃f(∀e1(e1 : exp1 →∃e2(e2 = f(e1)∧e2 : exp2)).
Here, the mapping f represents a temporal relation
betweenevents, andtheformulae2 = f(e1)expresses
that e2 occurs at the same time as or beforee1.
The bias Parg(vcon) can be considered (an approx-
imation of) a parameter required for computing the
probability that a mapping frandom satis�es the re-
quirements for f in Φ when we randomly construct
frandom. The probability is denoted as P{e2 : exp2 ∧
e2 = frandom(e1)|e1 : exp1}E1 where E1 denotes
the number of events describable by exp1. We as-
sume that the larger this probability is, the more eas-
ily we can establish f. We can approximate P{e2 :
exp2∧e2 = frandom(e1)|e1 : exp1}asP(exp2)by1)
observing that theprobabilistic variables e1 ande2 are
independent since frandom associates them in a com-
pletely random manner and by 2) assuming that the
occurrence probability of the event instances describ-
able by exp2 can be approximated by the probability
that exp2 is observed in text corpora. This means that
P(exp2) is one of the metrics indicating how easily
we can establish the mapping f in Φ.
Then, the next question is what kind of expressions
should be regarded as the event description exp2. A
primary candidate will be the whole sentence appear-
ingin theconsequence part oftherule to beproduced.
Since we specify only a verb vcon and its argument n
in the consequence in a rule, P(exp2) can be denoted
by Parg(n,vcon), which is the probability that we ob-
serve the expression such that vcon is a head verb and
noccupies an argument position arg ofvcon. Bymul-
tiplying this probability to BasicS as a bias, we ob-
tain the following scoring function.
Scorecooc(n,vcon,vpre,arg,argprime) =
Parg(n,vcon)BasicS(n,vcon,vpre,arg,argprime)
In our experiments, though, this score did not work
well. Since Parg(n,vcon) often has a small value, the
problem of data sparseness seems to arise. Then, we
used Parg(vcon), which denotes the probability of ob-
serving sentences that contain vcon and its argument
position arg, no matter which noun occupies arg, in-
stead of Parg(n,vcon). We multiplied the probability
to BasicS as a bias and obtained the following score,
which is actually the scoring function we propose.
Score(n,vcon,vpre,arg,argprime) =
Parg(vcon)BasicS(n,vcon,vpre,arg,argprime)
4 Experiments
4.1 Settings
We parsed 35 years of newspaper articles (Yomiuri
87-01, Mainichi 91-99, Nikkei 90-00, 3.24GB in to-
tal) and 92.6GB of HTML documents downloaded
from the WWW using an existing parser (Kanayama
et al., 2000) to obtain the word (co-occurrence) fre-
quencies. All the probabilities used in our method
were estimated by maximum likelihood estimation
from these frequencies. We randomly picked 600
nouns as a development set. We prepared three test
sets, namely test sets A, B, and C, which consisted of
100 nouns, 250 nouns and 1,000 nouns respectively.
Note that all the nouns in the test sets were randomly
picked and did not have any common items with the
development set. In all the experiments, four human
judges checked ifeachproduced rulewasaproper one
without knowing how each rule was produced.
4.2 Effects ofUsing Coordinated Sentences
In the �rst series of experiments, we compared a
simpli�ed version of our scoring function BasicS
with some alternative scores. This was mainly
to check if coordinated sentences can improve
accuracy. The alternative scores we considered
61
 0
 20
 40
 60
 80
 100
 0  50  100  150  200  250  300  350  400
Pre
cisi
on
 (%
)
Number of inference rules
BasicS
S-VV
S-NV
MI
Conditional
Rand
Figure1: Comparison with the alternatives (4 judges)
 0
 20
 40
 60
 80
 100
 0  50  100  150  200  250  300  350  400
Pre
cisi
on
 (%
)
Number of inference rules
BasicS
S-VV
S-NV
MI
Conditional
Rand
Figure2: Comparison with the alternatives (3 judges)
are presented below. Note that we did not test
our bias mechanism in this series of experiments.
S-VV(n,vcon,vpre,arg,argprime) =
Parg(n,vcon)Pargprime(n,vpre)/P(n)2
S-NV(n,vcon,vpre) = Pcoord(vcon,vpre)
MI(n,vcon,vpre) = Pcoord(vcon,vpre)/(P(vcon)P(vpre))
Cond(n,vcon,vpre,arg,argprime)
= Pcoord(vcon,vpre,arg,argprime)Parg(n|vcon)Pargprime(n|vpre)
/(Pargprime(n,vpre)P(n))
Rand(n,vcon,vpre,arg,argprime) = randomnumber
S-VV was obtained by approximating the proba-
bilities of coordinated sentences, as in the case of
BasicS. However, we assumed the occurrences of
two verbs were independent. The difference between
the performance of this score and that of BasicS
will indicate the effectiveness of using verb-verb
co-occurrences in coordinated sentences.
The second alternative, S-NV, simply ignores the
noun-verb co-occurrences in BasicS. MI is a score
based onmutual information androughly corresponds
to the score used in a previous attempt to acquire tem-
poral relations between events (Chklovski and Pan-
tel, 2004). Cond is an approximation of the proba-
bility P(n,vcon|n,vpre); i.e., the conditional proba-
 0
 20
 40
 60
 80
 100
 0  20  40  60  80  100  120
Pre
cisi
on
 (%
)
Number of inference rules
BasicS
S-VV
S-NV
MI
Cond
Figure3: Comparison with the alternatives (3 judges)
bility that the coordinated sentences consisting of n,
vcon andvpre are observed given the precondition part
consisting of vpre and n. Rand is a random number
and generates rules by combining verbs that co-occur
with the given n randomly. This was used as a base-
line method of our task
Theresulting precisions areshown in Figures 1 and
2. The �gure captions specify �(4 judges)�, as in Fig-
ure 1, when the acceptable rules included only those
regarded as proper by all four judges; the captions
specify �(3 judges)�, as in Figure 2, when the ac-
ceptable rules include those considered proper by at
least three of the fourjudges. We used test set A (100
nouns) and produced the top four rule candidates for
each noun according to each score. As the �nal re-
sults, all the produced rules for all the nouns were
sorted according to each score, and a precision was
obtained for top N rules in the sorted list. This was
thesameastheprecision achieved bysetting thescore
valueofN-thruleinthesorted list asthreshold θ. No-
tice that BasicS outperformed all the alternatives2,
though the difference between S-VV and BasicS
was rather small. Another important point is that the
precisions obtained with the scores that ignored noun-
verb co-occurrences were quite low. These �ndings
suggest that 1) coordinated sentences can be useful
clues for obtaining temporally constrained rules and
2) noun-verb co-occurrences are also important clues.
Intheaboveexperiments, weactually allowednoun
n to appear as argument types other than the syntac-
tic objects of a verb. When we restricted the argu-
2Actually, the experiments concerning Rand were conducted
considerably after the experiments on the other scores, and only
the two of the four judges for Rand were included in the judges
for other scores. However, we think that the superiority of our
score BasicS over the baseline method was con�rmed since the
precision of Rand was drastically lower than that of BasicS
62
 20
 30
 40
 50
 60
 70
 80
 90
 100
 0  50  100  150  200  250  300  350
Pre
cisi
on
 (%
)
Number of inference rules
Proposed direction
Reversed
Figure4: Two directions of implications (3 judges)
ment types to syntactic objects, as described in Sec-
tion 2, the precision shown in Figure 3 was obtained.
In most cases, BasicS outperformed the alternatives.
Although the number of produced rules was reduced
because of this restriction, the precision of all pro-
duced rules was improved. Because of this, we de-
cided to restrict the argument type to objects.
The kappa statistic for assessing the inter-rater
agreement was 0.53,which indicates moderate agree-
ment according to Landis and Koch, 1977. The kappa
value for only the judgments on rules produced by
BasicS rose to 0.59. After we restricted the verb-
noun co-occurrences to verb-object co-occurrences,
the kappa became 0.49, while that for the rules pro-
duced by BasicS was 0.543.
4.3 Direction of Implications
Next, we examined the directions of implications and
the temporal order between events. We produced
1,000 rules for test set B (250 nouns) using the score
BasicS, again without restricting the argument types
of given nouns to syntactic objects. When we re-
stricted theargumentpositionstoobjects, weobtained
347rules. Then, fromeach generated rule, we created
a new rule having an opposite direction of implica-
tions. We swapped the precondition and the conse-
quence oftherule andreversed itstemporal order. For
instance, wecreated �If someoneenacts alaw, usually
someone enforces the law at the same time as or after
the enacting of the law� from �If someone enforces a
law, usually someone enacts the law at the same time
as or before the enforcing of the law�.
Figure 4 shows the results. �Proposed direction�
3These kappa values were calculated for the results except for
the ones obtained by the score Rand, which were assessed by
different judges. The kappa forRand was 0.33(fair agreement).
 40
 50
 60
 70
 80
 90
 100
 0  50  100  150  200  250  300  350  400
Pre
cisi
on
 (%
)
Number of inference rules
Score
reranked by Score
reranked by ScoreCooc
BasicS
reranked by PreBias
Figure 5: Effects of the bias (4 judges)
 40
 50
 60
 70
 80
 90
 100
 0  50  100  150  200  250  300  350  400
Pre
cisi
on
 (%
)
Number of inference rules
Score
reranked by Score
reranked by ScoreCooc
BasicS
reranked by PreBias
Figure 6: Effects of the bias (3 judges)
refers to the precision of the rules generated by our
method. The precision of the rules with the opposite
direction is indicated by �Reversed.� The precision of
�Reversed� was much lower than that of our method,
and this justi�es our choice of direction. The kappas
values for�BasicS� and�Reversed�were0.54and0.46
respectively. Both indicate moderate agreement.
4.4 Effects ofthe Bias
Last, we compared Score and BasicS to see the ef-
fect of our bias. This time, we used test set C (1,000
nouns). The rules were restricted to those in which
the given nouns are syntactic objects of two verbs.
Theevaluation was doneforonly the top400rules for
each score. The results are shown in Figures 5 and 6.
�Score� refers to the precision obtained with Score,
while �BasicS� indicates the precision with BasicS.
For most data points in both graphs, the �Score� pre-
cision was about 10% higher than the �BasicS� preci-
sion. InFigure6,the precision reached 70%when the
400 rules were produced. These results indicate the
desirable effect of our bias for, at least, the top rules.
63
rank inference rules
/judges
4/0 moshi yougi wo hininsuru naraba,
yougi wo mitomeru
(If someonedenies suspicions, usually
someonecon�rms the suspicions.)
6/4 moshi jikokiroku wo uwamawaru
naraba, jikokiroku wo koushinsuru
(If someonebetters her best record,usually
someonebreaks her best record.)
21/3 moshi katakuriko wo mabusu naraba,
katakuriko wo tsukeru
(If someonecoats something with potato starch,
usually someonecovers somethingwith the starch)
194/4 moshi sasshi wo haifusuru naraba,
sasshi wo sakuseisuru
(If someonedistributes a booklet, usually
someonemakes the booklet.)
303/4 moshi netsuzou wo kokuhakusuru
naraba, netsuzou wo mitomeru
(If someoneconfesses to a fabrication, usually
someoneadmits the fabrication.)
398/3 moshi ifuku wo kikaeru naraba,
ifuku wo nugu
(If someonechanges clothes, usually
someonegets out of the clothes.)
Figure7: Examples of acquired inference rules
The 400 rules generated by Score included 175 dis-
tinct nouns and 272 distinct verb pairs. Examples of
the inference rules acquired by Score are shown in
Figure 7 along with the positions in the ranking and
the numbers of judges who judged the rule as being
proper. (We omitted the phrase �the same time as or
before� in the examples.) The kappa was 0.57 (mod-
erate agreement).
In addition, the graphs compare Score with some
other alternatives. This comparison was made to
check the effectiveness of our bias more carefully.
The 400 rules generated by BasicS were re-ranked
using Score and the alternative scores, and the pre-
cision for each was computed using the human judg-
ments for the rules generated by BasicS. (We did
not evaluate the rules directly generated by the al-
ternatives to reduce the workload of the judges.)
The �rst alternative was Scorecooc, which was pre-
sented in Section 3. Here, �reranked by ScoreCooc�
refers to the precision obtained by re-ranking with of
Scorecooc. The precision was below that obtained by
the re-ranking with Score, (referred to as �reranked
by Score)�. As discussed in Section 3, this indicates
the bias Parg(vcon) in Score works better than the
bias Parg(n,vcon) in Scorecooc.
Thesecondalternative wasthe scoring function ob-
tained by replacing the bias Parg(vcon) in Score with
Pargprime(vpre) , which is roughly the probability that the
verb in the precondition will be observed. The score
is denoted as PreBias(n,vcon,vpre,arg,argprime) =
Pargprime(vpre)BasicS(n,vcon,vpre,arg,argprime). The
precision of this score is indicated by �reranked by
PreBias� and is much lower than that of �reranked by
Score�, indicating that only probability of the verbs
in the consequences should be used as a bias. This is
consistent with our assumption behind the bias.
5 Conclusion
We have presented an unsupervised method for ac-
quiring inference rules with temporal constraints,
such as �If someone enforces a law, someone enacts
the law at the same time as or before the enforcing of
the law�. We used the probabilities of verb-verb co-
occurrences in coordinated sentences and verb-noun
co-occurrences. We have also proposed a bias mecha-
nism that can improve the precision of acquired rules.
References
R. Barzilay and L. Lee. 2003. Learning to paraphrase:an
unsupervised approach using multiple-sequence align-
ment. In Proc. of HLT-NAACL 2003, pages 16�23.
T. Chklovski and P. Pantel. 2004. Verbocean: Mining the
webfor�ne-grainedsemanticverbrelations. InProc. of
EMNLP-04.
I. Dagan, O. Glickman, and B. Magnini, editors.
2005. Proceedings of the First Challenge Work-
shop: Recognizing Textual Entailment. available from
http://www.pascal-network.org/Challenges/RTE/.
T. Fujiki, H. Namba, and M. Okumura. 2003. Automatic
acquisition of script knowledgefromtext collection. In
Proc. of The Research Note Sessionsof EACL�03.
M. Geffet and I. Dagan. 2005. The distributional inclu-
sionhypotheses andlexical entailment. In Proc. of ACL
2005, pages 107�114.
H. Kanayama, K. Torisawa, Y. Mitsuishi, and J. Tsujii.
2000. AhybridJapaneseparserwithhand-craftedgram-
mar andstatistics. In Proc. of COLING 2000.
J. R. Landis and G. G. Koch. 1977. The measurement
of observer agreement for categorial data. Biometrics,
33:159�174.
D. Lin and P. Pantel. 2001. Discovery of inference rules
for question answering. Journal of Natural Language
Engineering.
Y. Shinyama, S. Sekine, and K. Sudo. 2002. Automatic
paraphrase acquisition from news articles. In Proc. of
HLT2002.
I. Szepektor, H. Tanev, I. Dagan, and B. Coppola. 2004.
Scaling web-based acquisition of entailment relations.
In Proc. of EMNLP 2004.
64
