Learning Semantic-Level Information Extraction Rules by 
Type-Oriented ILP 
Yutaka Sasaki and Yoshihiro Matsuo 
NTT Communication Science Laboratories 
2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto 619-0237, Japan 
{sasaki, yosihiro} ~cslab.kecl.ntt.co.jp 
Abstract 
This paper describes an approach to using se- 
mantic rcprcsentations for learning information 
extraction (IE) rules by a type-oriented induc- 
tire logic programming (ILl)) system. NLP 
components of a lnachine translation system are 
used to automatically generate semantic repre- 
sentations of text corpus that can be given di- 
rectly to an ILP system. The latest experimen- 
tal results show high precision and recall of the 
learned rules. 
1 Introduction 
Information extraction (IE) tasks in this paper 
involve the MUC-3 style IE. The input for the 
information extraction task is an empty tem- 
plate and a set of natural  texts that de- 
scribe a restricted target domain, such as corpo- 
rate mergers or terrorist atta.cks in South Amer- 
ica. Templates have a record-like data struc- 
ture with slots that have names, e.g., "company 
name" and "merger d~te", and v~lues. The out- 
put is a set of filled templates. IE tasks are 
highly domain-dependent, so rules and dictio- 
naries for filling values in the telnp\]ate slots de- 
pend on the domain. 
it is a heavy burden for IE system develop- 
ers that such systems depend on hand-made 
rules, which cannot be easily constructed and 
changed. For example, Umass/MUC-3 needed 
about 1,500 person-hours of highly skilled labor 
to build the IE rules and represent them as a 
dictionary (Lehnert, 1992). All the rules must 
be reconstructed i'rom scratch when the target 
domain is changed. 
To cope with this problem, some pioneers 
have studied methods for learning information 
extraction rules (Riloff,1996; Soderland ctal., 
1.995; Kim et el., 1995; Huffman, 1996; Califf 
and Mooney, 1997). Along these lines, our ap- 
preach is to a.pply an inductive logic program- 
ruing (ILP) (Muggleton, 1991)system to the 
learning of IE rules, where information is ex- 
tracted from semantic representations of news 
articles. The ILP system that we employed is 
a type-oriented ILP system I{\]\]B + (Sasaki and 
Haruno, 1997), which can efficiently and effec- 
tively h~mdle type (or sort) information in train- 
ing data. 
2 Our Approach to IE Tasks 
This section describes our approach to IE tasks. 
Figure 1. is an overview of our approach to learn- 
ing IE rules using an II, P system from seman- 
tic representations. First, training articles are 
analyzed and converted into semantic represen- 
tations, which are filled case fl'ames represented 
as atomic formulae. Training templates are pre- 
pared by hand as well. The ILP system learns 
\]!!; rules in the tbrm of logic l)rograms with type 
information. To extract key inlbrmation from a 
new ~rticle, semantic representation s au tomat- 
ically generated from the article is matched by 
the IE rules. Extracted intbrmation is filled into 
the template slots. 
3 NLP Resources and Tools 
3.1 The Semantic Attribute System 
We used the semantic attribute system of "Gel 
Taikei -- A Japanese Lexicon" (lkehara el el., 
1997a; Kurohashi and Sakai, 1.999) compiled by 
the NTT Communication Science Laboratories 
for a Japanese-to-English machine translation 
system, ALT-J/E (Ikehm:a et al., 1994). The se- 
mantic attribute system is a sort of hierarchical 
concept thesaurus represented as a tree struc- 
ture in which each node is called a semantic 
cateqory. An edge in the tree represents an is_a 
or has_a relation between two categories. The 
semantic attribute system is 11.2 levels deep and 
698 
semantic representation new article 
''' \[\]\]\] s°~chy yze ~ Analyze I rolease(cl,pl) articles sentences announce(cl,dl) 
i'~kackgrou n d Anal 
.... nzwledge / \[E rules ~ F representatiOn sentences I semantic .ooitive ..... l 
I re,oa o x. , II 
answer 
templates filled Company: c2 ~7"A;p'-'"iyrule~='~" "~'~ by hand Draotauotd..~2 to semantic 
I- I ,opreseot t,on 
Figure l: l/lock diagram of IE using IM ) 
contains about 3,000 sema.ntic category nodes. 
More than 300,000 Japanese words a.re linked to 
the category nodes. 
3.2 Verb Case Frame Dictionary 
The Japanese-to-li;nglish valency 1)a.ttern dic- 
tionary of "(\]oi-Taikei" (lkehara et al., 1997b; 
Kurohash.i and Saka.i, 1999) was also originally 
developed for ALT-,I/IB. The. wde:ncy dictionary 
conta.ins about 15,000 case frames with sema.n- 
tic restrictions on their arguments lbr 6,000 
a apanese verbs. Each ca.se frame consists of one 
predicate a.nd one or more case elements tha.t 
h ave a list; of sere an tic categories. 
3.3 Natural Language Processing Tools 
We used the N I,P COml)onents of kl/l'-.I/F, for 
text a, nalysis. These inclu<le the morphologica,l 
amdyzer, the syntactic analyzer, and the case 
aDalyzer for Japanese. The components a.re ro- 
bust a:nd generic tools, mainly ta:rgeted to news- 
paper articles. 
3.3.1 Generic Case Analyzer 
l,et us examine the case a.nalysis in more de- 
tail. The <'as(; analyzer reads a set of parse tree 
candidates produced by the J a.panese syntactic 
analyzer. The parse tree is :represented as a de- 
penden cy of ph rases (i. e., .\] al>anese bu'nsctmt). 
First, it divides the parse tree into unit sen- 
tences, where a unit sentence consists of one 
predicate and its noun and adverb dependent 
phrases. Second, it compares each unit sen- 
tence.with a verb case fl'alne dictionary, l!;ach 
frame consists a predicate condition and several 
cast elements conditions. The predicate con- 
dition specifies a verb that matches the frame 
a.:nd each case-role has a. case element condition 
whi ch sl>ecifie.s particles an d sere an tic categories 
of" noun phrases. The preference va.lue is de- 
lined as the summation of noun phrase \])refer- 
ences which are calculated from the distances 
between the categories of the input sentences 
m~d the categories written in the fi'amcs. The 
case a.na.lyzer then chooses the most preferable 
pa.rse tree and the most preferable combination 
of case frames. 
The valency dictionary also has case<roles 
(Table \] ) for :noun phrase conditions. The case- 
roles of adjuncts are determined by using the 
particles of adjuncts and the sema.ntic ca.tegories 
of n ou n ph ra.ses. 
As a result, the OUtl)ut O\[' the case a.nalysis is 
a set; el" (;ase fl:ames for ca.oh unit se:ntence. The 
noun phra.ses in \['tames are la.beled by case-roh;s 
in Tal)le 1. 
l!'or siml)\]icity , we use case-role codes, such a.s 
N 1 and N2, a.s the labels (or slot ha.rues) to rep- 
resent case li:ames. The relation between sen- 
tences and case-roles is described in detail in 
(Ikehara el el., 1993). 
3.3.2 Logical Form Translator 
We developed a logical form translator li'E1 ~ 
that generates semantic representations ex- 
pressed a,s atomic Ibrmulae from the cast; fi:a.mes 
and parse trees. For later use, document II) 
and tense inlbrmation a.re also added to the case 
frames. 
For example, tile case fl:ame in 'l.'able 2 is ob- 
tained a:l'ter analyzing the following sentence of 
document l) 1: 
"Jalctcu(.lack) h,a suts,tkesu(suitca.se) we 
699 
Table 1: Case-Roles 
Name Code Description l~xampl.e 
Subject N1 the agent/experiencer of I throw a ball. 
an event/situation 
Objectl N2 the object of an event 
Object2 N3 another object of an event 
Loc-Source N4 source location of a movement 
Loc-Goal N5 goal location of a movement 
Purpose N6 the purpose of an action 
Result N7 the result of an event 
Locative N8 the location of an event 
Comitative N9 co-experiencer 
Quotative N10 quoted expression 
Material N 11 material/ingredient 
Cause N12 the reason for an event 
Instrument N13 a concrete instrument 
Means N14 an abstract instrument 
Time-Position TN1 the time of an event 
Time-Source TN2 the starting time of an event 
Time-Goal TN3 the end time of ~n event 
Amount QUANT quantity of something 
I throw a ball. 
I compare it with them. 
I start fl'om Japan. 
I go to Japan. 
I go shopping. 
It results in failure. 
it occurs at the station. 
I share a room with him. 
I say that .... 
I fill the glass with water. 
It collapsed fr'om the weight. 
I speak with a microphone. 
I speak in Japanese. 
I go to bed at i0:00. 
I work from Monday. 
It continues until Monday. 
I spend $10. 
 hok,,ba(the omce) kava(from)    o(the air 
port) ,),i(to)ha~obu(carry)" 
("Jack carries a suitcase from the office to the 
airport.") 
Table 2: Case Frame of the Sample Sentence 
predicate: hakobu (carry) 
article: 1) 1 
tense: present 
NI: Jakhu (Jack) 
N2: sutsukesu (suitcase) 
N4: sl, okuba (the office) 
N5: kuko (the airport) 
4 Inductive Learning Tool 
Conventional ILP systems take a set of positive 
and negative examples, and background knowl- 
edge. The output is a set of hypotheses in the 
form of logic programs that covers positives and 
do not cover negatives. We employed the type- 
oriented ILP system RHB +. 
4.1 Features of Type-orlented ILP 
System RHB + 
The type-oriented I\],P system has the tbllowing 
features that match the needs for learning l\]"~ 
rules. 
• A type-oriented ILP system can efficiently 
and effectively handle type (or seman- 
tic category) information in training data.. 
This feature is adwmtageous in controlling 
the generality and accuracy of learned IE 
rules. 
• It can directly use semantic representations 
of the text as background knowledge. 
, It can learn from only positive examples. 
• Predicates are allowed to have labels (or 
keywords) for readability and expressibil- 
ity. 
4.2 Summary of Type-oriented ILP 
System RHB + 
This section summarizes tile employed type- 
oriented ILP system RHB +. The input of 
RHB + is a set of positive examples and back- 
ground knowledge including type hierarchy (or 
700 
the semantic attribute system). The output is 
a set of I\[orn clauses (Lloyd, 11.987) having vari- 
;tl~les with tyl)e intbrmation. That is, the term 
is extended to the v-term. 
4.3 v-terms 
v-terms are the restricted form of 0-terms (Ai't- 
K~tci and Nasr, 1986; Ait-Kaci et al., 11994). In- 
l'ormttlly, v-terms are Prolog terms whose vari- 
ables a.re replaced with variable Var of type T, 
which is denoted as Var:T. Predicttte ~tnd tim(:- 
tion symbols ~tre allowed to h;we features (or 
labels). For examl)\]e, 
speak( agent~ X :human,objcct~ Y :) 
is a clause based on r-terms which ha.s labels 
agent and object, and types human and 
. 
4.,4 Algorithm 
The algorithm of lHllI + is basically ~t greedy 
covering algorithm. It constructs clauses one- 
by-one by calling inner_loop (Algorithm \]) 
which returns a hypothesis clause. A hypoth- 
esis clause is tel)resented in the form of head :-- 
body. Covered examples are removed from 1 ) in 
each cycle. 
The inner loop consists of two phases: the 
head construction phase and the body construc- 
tion I)hase. It constrncts heads in a bottom-up 
manner and constructs the body in a top-down 
lna.nner, following the result described in (Zelle 
el al., 1994). 
"\['he search heuristic PWI is weighted infor- 
m~tivity eml)loying the l,a.place estimate. Let 
7' = {Head :-Body } U B K. 
rwz(r,T)_ l If'l+ J 
- -I.f'--/× 1°g2 IQ-~\]'i\[ _12 2' 
where IPl denotes the number of positive ex- 
amples covered by T and Q(T) is the empirical 
content. The smaller the value of PWI, the can- 
didate clause is better. Q(T) is defined as the 
set of atoms (1) that are derivable from T ~md 
(2) whose predicate is the target I)redicate, i.e., 
the predicate name of the head. 
The dynamic type restriction, by positivc ex- 
amples uses positive examples currently covered 
in order to determine appropriate types to wtri- 
~bles for the current clause. 
Algorithm 1 inner_loop 
1. Given positives P, original positives 1~o, back- 
ground knowledge 1Hr. 
2. Decidc typcs of variables in a head by comput- 
ing the lyped least general generalizations (lgg) 
of N pairs of clcmcnts in P, and select the most 
general head as H cad. 
3. If the stopping condition is satisfied, return 
Head. 
It. Let Body bc empty. 
5, Create a set of all possible literals L using vari- 
ables in Head and Body. 
6. Let BEAM be top If litcrals l~, of L wilh 
respect to the positive weighted informalivily 
PWI. 
7. Do later steps, assuming that l~ is added to 
Body for each literal lk in BEAM. 
8. Dynamically restrict types in Body by callin, g 
the dynamic type restriction by positive exam- 
pies. 
9. If the slopping condition is satisfied, rct'aru 
(Head :- Body). 
lO. Goto 5. 
5 Illustration of a Learning Process 
Now, we examine tile two short notices of' new 
products release in Table 3. The following table 
shows a sample te:ml)late tbr articles reporting 
a new product relea.se. 
Tom pl ate 
1. article id: 
2. coml)any: 
3. product: 
4. release date: 
5.1 Preparation 
Suppose that the following semantic represen- 
tations is obtained from Article 1. 
(cl) announce( article => I, tense => past, 
tnl => "this week", 
nl => "ABC Corp.", 
nlO => (c2)). 
(c2) release( article => I, 
tense => future, 
tni => "Jan. 20", 
nl => "ABC Corp.", 
n2 => "a color printer" ). 
701 
Table 3: Sample Sentences 
Article id Sentence 
#1 "ABC Corp. this week zmnounced that it will release a color printer on Jan. 20." 
#2 "XYZ Corp. released a color scanner last month." 
The filled template for Article 1 is as follows. 
Template \] 
\]. article id: 1 
2. colnpany: ABC Corp. 
3. product: a color printer 
4. release date: Jan. 20 
Suppose that the following semantic represen- 
tation is obtained from Article 2. 
(c3) release( article => 2, tense => past, 
tnl => "last month", 
nl => "XYZ Corp.", 
n2 => "a color scanner" ). 
The filled template for Article 2 is as follows. 
Template 2 
1. article id: 2 
2. company: XYZ Corp. 
3. product: a color scanner 
4. release date: last month 
5.2 Head Construction 
Two positive examples are selected for the tem- 
plate slot "company". 
company(article-number => i 
name => "ABe Corp"). 
company(article-number => 2 
name => "XYZ Corp"). 
By computing a least general generalization 
(lgg)sasaki97, the following head is obtained: 
company( article-number => Art: number 
name => Co: organization). 
5.3 Body Construction 
Generate possible literals 1 by combining predi- 
cate names and variables, then check the PWI 
1,1iterals,, here means atomic formulae or negated 
ones. 
values of clauses to which one of the literal 
added. In this case, suppose that adding the fol- 
lowing literal with predicate release is the best 
one. After the dynamic type restriction, the 
current clause satisfies the stopping condition. 
Finally, the rule for extracting "company name" 
is returned. Extraction rules for other slots 
"product" and "release date" can be learned in 
the sanle manner. Note that several literals may 
be needed in the body of the clause to satisfy 
the stopping condition. 
company(article-number => Art:number, 
name => Co: organization ) 
• - release( article => Art, 
tense => T: tense, 
tnl => D: time, 
nl => Co, 
n2 => P: product ). 
5.4 Extraction 
Now, we have tile following sen\]antic represen- 
tation extracted from the new article: 
Article 3: "JPN Corp. has released a new CI) 
player. ''2 
(c4) release( article => 3, 
tense => perfect_present, 
tnl => nil, 
nl => "JPN Corp.", 
n2 => "a new CD player" ). 
Applying the learned IE rules and other rules, 
we can obtain the filled template for Article 3. 
Template 3 
1. article id: 3 
2. company: JPN Corp. 
3. product: CI)player 
4. release date: 
2\;Ve always assume nil for the case that is not in- 
cluded in the sentence. 
702 
Table d: Learning results of new product release 
(a) Without data correction 
company product release date 
Precision 89.6% 
Recall 82.1% 
Average time (set.) 15.8 
l)recision 911 .1% 
Recall 85.7% 
Average time (sec.) 22.9 
80.5% 
66.7% 
22.J 
90.6% 
66.7% 
ld.d 
announce date \[ price 
lOO.O% 58.4% 
82.4:% 60.8% 
2.2 I 1.0 
(b) With data. correction 
company product release date 
80.o% 
69.7% 
25.2 
92.3% 
82.8% 
33.55 
annotmce date \[ price 
100.0% 87.1% 
88.2% 82.4% 
5.1.5 11.9 
6 Experimental Results 
6.1. Setting of Experhnents 
We extracted articles related to the release of 
new products from a one-year newspaper cor- 
pus written in Japanese 3. One-hundred arti- 
cles were randomly selected fi'om 362 relevant 
articles. The template we used consisted of 
tive slots: company name, prod'uct name, re- 
lease date, a~tnomzcc date, and price. We also 
filled one template for each a.rticle. After an- 
a.lyzing sentences, case fi'ames were converted 
into atomic tbrmulae representing semantic rep- 
re,,~entationx a.x described in Section 2 and 3. All 
the semantic representations were given to the 
lea.rner as background \]¢nowledge, ~md the tilled 
templates were given as positive examples. To 
speed-up the leCturing process, we selected pred- 
icate names that are relevant to the word s in the 
templates as the target predicates to be used by 
the ILl ~ system, and we also restricted the num- 
ber of literals in the body of hypotheses to one. 
Precision and recM1, the standard metrics \['or 
IF, tasks, are counted by using the remove-one- 
out cross validation on tile e, xamples for each 
item. We used a VArStation with tlie Pentium 
H Xeon (450 MHz):for this experiment. 
6.2 Results 
'l?M)le 4 shows the results of our experiment. In 
the experiment of learning from semantic repre- 
sentations, including errors in case-role selection 
and semantic category selection, precision was 
3We used ~rticles from the Mainichi Newspaimrs of 
1994 with permission. 
very high. 'l'he precision of the learned rules 
lot price was low beta.use the seman tic category 
name automatieaJly given to the price expres- 
sions in the dat~ were not quite a.ppropriate. 
For the tire items, 6?-82% recall was achieved. 
With the background knowledge having sere an- 
tic representations corrected by hand, precision 
was very high mid 70-88% recMl was achieved. 
The precision of price was markedly improved. 
It ix important that the extraction of live 
ditthrent pieces o1' information showed good re- 
sults. This indica.tex that the \]LI' system RIII~ + 
has a high potential in IE tasks. 
7 Related Work 
l)revious researches on generating lli; rules 
from texts with templates include AutoSlog- 
TS (Riloff,1996), (',I{YS'FAL (Soderland et al., 
1995), I'AIAKA (l(im et al., 1995), MlgP (Iluff- 
man, 11.996) and RAPII~;I~ (Califl' and Mooney, 
1997). In our approach, we use the type- 
oriented H,P system RItlJ +, which ix indepen- 
dent of natural  analysis. This point 
differentiates our ~pproach from the others. 
Learning semantic-level IE rules using an II,P 
system from semantic representations is also a 
new challenge in II'; studies. 
Sasald (Sasaki and Itaruno, 11997) applied 
RI{B + to the extraction of the number of deaths 
and injuries fi'om twenty five articles. That 
experiment was sufficient to assess the perfor- 
mance of the learner, but not to evaJuate its 
feasibility in IE tasks. 
703 
8 Conclusions and Remarks 
This paper described a use of semantic repre- 
sentations for generating information extraction 
rules by applying a type-oriented ILP system. 
Experiments were conducted on the data gen- 
erated fi'om 100 news articles in the domain of 
new product release. The results showed very 
high precision, recall of 67-82% without data 
correction and 70-88% recall with correct se- 
mantic representations. The extraction of five 
different pieces of information showed good re- 
sults. This indicates that our learner RHB + has 
a high potential in IE tasks. 

References 

H. Ai't-Kaci and R. Nasr, LOGIN: A logic pro- 
gramming  with built-in inheritance, 
Journal oJ' Logic Programming, 3, pp.185- 
215, 1986. 

lt. Ai't-Kaci, B. Dumant, R. Meyer, A. Podel- 
ski, and P. Van Roy, The Wild Life Itandbook, 
1994. 

M. E. Califf and R. J. Mooney, Relational 
Learning of Pattern-Match Rules for Informa- 
tion Extraction, Proc. of ACL-97 Workshop 
in Natural Language Learning, 1997. 

S. B. Huffman, Learning Information Extrac- 
tion Patterns from Examples, Statistical and 
Symbolic Approaches to Learning for Natural 
Language Processing, pp.246 260, 1996. 

S. ikehara, M. Miyazaki, and A. Yokoo, Clas- 
si:fication of  knowledge for mean- 
ing analysis in machine translations, Trans- 
actions of Information Processing Society 
of Japan, Vol.34, pp.1692-1704, 1993. (in 
.Japanese) 

S. Ikehara, S. Shirai, K. Ogura, A. Yokoo, 
H. Nakaiwa and T. Kawaoka, ALT-J/E: A 
Japanese to English Machine Translation Sys- 
tem tbr Communication with Translation, 
Proc. of The 13th IFIP World Computer 
Congress, pp.80-85, 1994. 

S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo, 
H. Nakaiwa, K. Ogura, Y. Oyama and 
Y. Hayashi (eds.), The Semantic Attribute 
System, Goi-lktikci -- A Japanese Lexi- 
con, Vol.1, Iwanami Publishing, 1997. (in 
Japanese) 

S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo, 
H. Nakaiwa, K. Ogura, Y. Oyama and 
Y. Hayashi (eds.), The Valency Dictionary, 
Goi-Taikei -- A Japanese Lcxicon, Vol.5, 
Iwa.nami Publishing, 1997. (in Japanese) 

J.-T. Kim and D. I. Moldovan, Acquisition 
of Linguistic Patterns for Knowledge-Based 
Information Extraction, \[EEE Transaction 
on Knowledge and Data Engineering (IEEE 
TKDE), Vol.7, No.5, pp.713 724, 1995. 

S. Kurohashi and Y. Sakai, Semantic Analysis 
of Japanese Noun Phrases: A New Approach 
to Dictionary-Based Understanding Thc 37th 
Annual Meeting of the Association for Com- 
putational Linguistics (A CL-99), pp.481-488, 
1999. 

W. Lehnert, C. Cardie, D. Fisher, J. McCarthy, 
E. Riloff and S. Soderland, University of Mas- 
sachusetts: MUC-4 Test Results and Analy- 
sis, Proc. of The 1;burth Message Understand- 
ing Conference (MUC-4), pp.151-158, 1992. 

J. Lloyd, Foundations of Logic Prog'mmming, 
Springer, 1987. 

S. Muggleton, Inductive logic programming, 
New Generation Computing, Vol.8, No.4, 
pp.295-318, 1991. 

E. Riloff, Automatically Generating Extrac- 
tion Pattern from Untagged Text, Proc.of 
American Association for Artificial IntcIli- 
gcnce (AAAI-96), pp.1044-1049, 1996. 

Y. Sasaki and M. IIaruno, RHB+: A Type- 
Oriented 1LP System Learning from Positive 
Data, Proc. of The l/jth International Joint 
Conference on Artificial Intelligence (LJCA l- 
9"/), pp.894-899, 1997. 

S. Soderland, 1). Fisher, J. Aseltine, W. Lenert, 
CRYSTAL: Inducing a Conceptual Dictio- 
n~ry, Proc. of The 13th International Joint 
ConJ'crcnce on Artificial Intelligence (IJCAI- 
95), pp.1314 1319, 1995. 

J. M. Zelle and R. J. Mooney, J. B. Konvisser, 
Combining Top-down and Bottom-up Meth- 
ods in Inductive Logic Programming, Proc 
of The 11th Tntcrnational Conference on Ma- 
chine Learning (ML-94), pp.343-351, 1994.  
