In: Proceedings of CoNLL-2000 and LLL-2000, pages 115-118, Lisbon, Portugal, 2000. 
Learning IE Rules for a Set of Related Concepts 
J. Turmo and H. Rodriguez 
TALP Research Center. Universitat Polit~cnica de Catalunya 
Jordi Girona Salgado, 1-3 
E-08034 Barcelona - Spain 
1 Introduction 
The growing availability of on-line text has led 
to an increase in the use of automatic knowledge 
acquisition approaches from textual data. In 
fact, a number of Information Extraction (IE) 
systems has emerged in the past few years in 
relation to the MUC conferences 1. The aim of 
an IE system consists in automatically extract- 
ing pieces of information from text, being this 
information relevant for a set of prescribed con- 
cepts (scenario). One of the main drawbacks of 
applying IE systems is the high cost involved in 
manually adapting them to new domains and 
text styles. 
In recent years, a variety of Machine Learn- 
ing (ML) techniques has been used to improve 
the portability of IE systems to new domains, 
as in SRV (Freitag, 1998), RAPIER (Califf 
and Mooney, 1997), LIEP (Huffman, 1996), 
CRYSTAL (Soderland et al., 1995) and WHISK 
(Soderland, 1999) . However, some drawbacks 
remain in the portability of these systems: a) 
existing systems generally depend on the sup- 
ported text style and learn IE-rules either for 
structured texts, semi-structured texts or free 
text , b) IE systems are mostly single-concept 
learning systems, c) consequently, an extrac- 
tor (e.g., a rule set) is learned for each con- 
cept within the scenario in an independent man- 
ner, d) the order of execution of the learners 
is set manually, and so are the scheduling and 
way of combination of the resulting extractors, 
and e) focusing on the training data, the size of 
available training corpora can be inadequate to 
accurately learn extractors for all the concepts 
within the scenario 2. 
1 http://www.muc.saic.com/ 
~This is so when dealing with some combinations of 
text style and domain. 
This paper describes EVIUS, a multi-concept 
learning system for free text that follows a 
multi-strategy constructive learning approach 
(MCL) (Michalshi, 1993) and supports insuffi- 
cient amounts of training corpora. EVIUS is 
a component of a multilingual IE system, M- 
TURBIO (Turmo et al., 1999). 
2 EVIUS. Learning rule sets for a 
set of related concepts 
The input of EVIUS is both a partially-parsed 
semantically-tagged 3 training corpus and a de- 
scription of the desired target structure. This 
description is provided as a set of concepts C 
related to a set of asymmetric binary relations, 
T~. 
In order to learn set S of IE rule sets for the 
whole C, EVIUS uses an MCL approach inte- 
grating constructive learning, closed-loop learn- 
ing and deductive restructuring (Ko, 1998). 
In this multi-concept situation, the system 
determines which concepts to learn and, later, 
incrementally updates S. This can be relatively 
straightforward when using knowledge about 
the target structure in a closed-loop learning 
approach. Starting with C, EVIUS reduces set 
b/of unlearned concepts iteratively by selecting 
subset P C/g formed by the primitive concepts 
in/.4 and learning a rule set for each c E P 4 
For instance, the single colour scenario 5 in fig- 
3With EuroWordNet (http://www.hum.uva.nl/-ewn/) 
synsets. No attempt has been made to disambiguate 
such tags. 
4No cyclic scenarios are allowed so that a topological 
sort of C is possible, which starts with a set of primitive 
concepts. 
5Our testing domain is mycology. Texts consists of 
Spanish descriptions of specimens. There is a rich variety 
of colour descriptions including basic colours, intervals, 
changes, etc. 
115 
ure 1 is provided to learn from instances of the 
following three related concepts: colour, such 
as in instance "azul ligeramente claro" (slightly 
pale blue), colour_interval, as in "entre rosa 
y rojo sangre" (between pink and blood red), 
and to_change, as in "rojo vira a marr6n" (red 
changes to brown). 
Initially, Lt = C = { colour, colour_interval, 
to_change}. Then, EVIUS calculates 
7 9 ={colour} and once a rule set has been 
learned for colour, the new L/={colour_interval, 
to_change} is studied identifying 79 = L/. 
to to 
from from 
Figure 1: A single scenario for the colour do- 
main 
In order to learn a rule set for a concept, 
EVIUS uses the relational learning method ex- 
plained in section 3, and defines the learn- 
ing space by means of a dynamic predicate 
model. As a pre-process of the system, the 
training corpus is translated into predicates 
using the following initial predicate model: 
a) attributive meta-predicates: pos_X(A), 
isa_X(A), has_hypernym_X(A), word_X(A) 
and lemma_X(A), where X is instantiated with 
closed categories, b) relational meta-predicates: 
distance_le._X(A,B), stating that there are X 
terminal nodes, at most, between A and B, and 
c) relational predicates: ancestor(A,B), where B 
is the syntactic ancestor of A, and brother(A,B), 
where B is the right brother node of A sharing 
the syntactic ancestor. 
Once a rule set for concept c is learned, 
new examples are added for further learning by 
means of a deductive restructuring approach: 
training examples are reduced to generate a 
more compact and useful knowledge of the 
learned concept. This is achieved by using 
the induced rule set and a syntactico-semantic 
transformational grammar. Further to all this, 
a new predicate isa_c is added to the model. 
For instance, in figure 2 6 , the Spanish sen- 
tence "su color rojo vira a marrSn oscuro" 
(its red colour changes to dark brown) has 
6Which is presented here as a partially-parsed tree 
for simplicity. 
S (n12) 
spec n a v prep/ n a 
sucolorro~vira a{lmarrdnloscurc ~ } 
(nl) (n2) (n3) (n4)(n5)~(n6) . (n7)/ 
(n~eduction 
spec n a v prep/( gnom .\ 
~'ra a marr6n oscur~ ) 
(nl) (n2) (n3) (n4) (n5)k _~ j 
Figure 2: Restructuring training examples 
two examples of colour, n3 and n6+n7, be- 
ing these "rojo" (red) and "marr6n'+"oscuro" 
(dark brown). No reduction is required by the 
former. However, the latter example is reduced 
to node n6'. As a consequence, two new at- 
tributes are added to the model: isa_colour(n3) 
and isa_colour(n6'). This new knowledge will 
be used to learn the concepts to_change and 
colour_interval. 
3 Rule set learning 
EVIUS uses FOIL (First-order Induction Learn- 
ing) (Quinlan, 1990) to build an initial rule set 
7~0 from a set of positive and negative examples. 
Positive examples C + can be selected using a 
friendly environment either as: 
• text relations: c(A:,A2) where both A: and 
A2 are terminal nodes that exactly delimit 
a text value for c. For instance, both text 
relations colour(n3,n3) or colour(n6,nT) in 
figure 2, or as: 
• ontology relations: c(A:,A2,...,An) where 
all Ai are terminal nodes which are in- 
stances of already learned concepts related 
to c in the scenario. For instance, the on- 
tology relation to_change(n3,n6') 7, in the 
same figure, means that the colour repre- 
sented by instance n3 changes to that rep- 
resented by n6'. 
Negative examples $- are automatically se- 
lected as explained in section 3.1. 
7Note that, after the deductive restructuring step, 
both n3 and n6' are instances of the concept colour. 
116 
If any uncovered examples set, g~-, remains 
after FOIL's performance, this is due to the lack 
of sufficient examples. Thus, the system tries 
to improve recall by growing set g+ with arti- 
ficial examples (pseudo-examples), as explained 
in 3.2. A new execution of FOIL is done by 
using the new g+. The resulting rule set 7~ 
is combined with T~0 in order to create 7¢1 by 
appending the new rules from T¢~ to 7¢0. Conse- 
quently, the recall value of 7~1 is forced to be at 
least equal to that of 7~0, although the accuracy 
can decrease. A better method seems to be the 
merging of rules from 7~ and TO0 by studying 
empirical subsumptions. This last combination 
allows to create more compact and accurate rule 
sets. 
EVIUS uses an incremental learning approach 
to learn rule sets for each concept. This is done 
by iterating the process above while uncovered 
examples remain and the F1 score increment 
(AF1) is greater than pre-defined constant a: 
select g+ and generate g- 
7~0 = FOIL(g+,g -) 
$u + = uncover ed_ f r om ( 7~o ) 
= (7¢o) 
while $u + ~ 0 and AF1 > a do 
g+ = g+ U pseudo-examples($u +) 
T¢~ = FOIL(E+,g -) 
T~i+ l = combine_rules(7~i,T¢~) 
gu + = uncovered_f rom( TQ+ l ) 
= El(hi+l) - El(hi) 
endwhile 
if AF1 > a then return "~i+1 
else return 7~i 
endi/ 
3.1 Generating relevant negative 
examples 
Negative examples can be defined as any com- 
bination of terminal nodes out of g+. However, 
this approach produces an extremely large num- 
ber of examples, out of which only a small sub- 
set is relevant to learn the concept. Related to 
this, (Freitag, 1998) uses words to learn only 
slot rules (learned from text-relation examples) 
, selecting as negative those non-positive word 
pairs that define a string as neither longer than 
the maximum length in positive examples, nor 
shorter than the minimum. 
A more general approach is adopted to define 
the distance between possible examples in the 
learning Space, applying a clustering method us- 
ing positive examples as medoids s. The N near- 
est non-positive examples to each medoid can be 
selected as negative ones. Distance, in our case, 
must be defined as multidimensional due to the 
typology of occurring features. It is relatively 
easy to define distances between examples for 
word_X and lemma_X predicates, being 1 when 
X values are equal, and 0 otherwise. For isa_X 
predicates, the minimum of all possible concep- 
tual distances (Agirre and Rigau, 1995) between 
X values in EWN has been used. Greater dif- 
ficulty is encountered when defining a distance 
from a morpho-syntactic point of view (e.g., a 
pronoun seems to be closer to a noun than a 
verb). In (Turmo et al., 1999), the concept of 
5-set has been presented as a syntactic relation 
generalization, and a distance measure has been 
based on this concept. 
3.2 Creating pseudo-examples 
A method has been used inspired by the gen- 
eration of convex pseudo data (Breiman, 1998), 
in which a similar process to gene-combination 
in genetic algorithms is used. 
For each positive example c(A1,... ,An) 9 of 
concept c to be dealt with, an attribute vector 
is defined as 
( word--X Bl ,. . . ,word._X B~ ,lemma-X sl ,..., 
lemma_X B~ ,sem-X B1 ,... ,sem_X B~ ,context) 
where B1,..., Bn are the unrepeated terminal 
nodes from A1,..., An, context is the set of all 
predicates subsumed by the syntactico-semantic 
structure between the nearest positive exam- 
ple on the left and the nearest one on the 
right, and sem_XB~ is the list of isa_X and 
has_hypernym_X predicates for Bi. 
Then, for each example uncovered by the rule 
set learned by FOIL, a set of pseudo-examples is 
generated. A pseudo-example is built by com- 
bining both the uncovered example vector and 
a randomly selected covered one. This is done 
as follows: for each dimension, one of both pos- 
sible values is randomly selected as value for the 
pseudo-example. 
SA medoid is an actual data point representing a clus- 
ter. 
9As defined in section 3. 
117 
T. Set* $+ 
150 105 
25o 206 
35o 270 
45o 328 
55o 398 
Recall\]Prec. F1 
56.86 100 0.725 
62.74 98.45 0.766 
73.53 97.40 0.838 
75.49 98.72 0.856 
75.49 98.7210.856 
Table 1: Results for the colour concept for dif- 
ferent training set sizes (* subscript 0 means 
only one FOIL iteration) 
4 Evaluation 
EVIUS has been tested on the mycological do- 
main. A set of 68 Spanish mycological docu- 
ments (covering 9800 words corresponding to 
1360 lemmas) has been used. 13 of them have 
been kept for testing and the others for train- 
ing. The target ontology consisted of 14 con- 
cepts and 24 relations. 
Several experiments have been carried out 
with different training sets. Results of the initial 
rule set for the colour concept 1° are presented 
in table 1. 
Out of 34 in the 350 initial rule set, one of the 
most relevant learned rules is11: 
Col our ( A, B ) :-has_h ypern ym_OOO17586n ( B ) , 
has_hypernym_O3464624n (A), brother (A, B). 
Table 2 shows the results of adding pseudo- 
examples to the 35012 training set and using the 
algorithm in section 3. This was tested with 
a = 0.01 (two iterations are enough, 351 and 
352) and 5 pseudo-examples for each uncovered 
case. The algorithm returns the rule set pro- 
duced in the first iteration due to the fact that 
~F1T13> 0.01 between the first and the sec- 
ond iterations. Higher results can be generated 
when using lower values for a. 
Although no direct comparison with other 
systems is possible due to the domain and lan- 
guage used, our results can be considered state- 
1°This concept appears to be the most difficult to be 
learned. 
11A chromatic colour (03464624n) that is the left syn- 
tactic brother of an attribute (00017586n) such as lumi- 
nosity or another chromatic colour. 
12This size has been selected to allow a better com- 
parison with the results in table 1. 
laF1T means the F1 value for training sets 
T. Set E + Fir Recall Prec. F1 
351 415 0.981 76.47 97.50 0.857 
352 465 0.987 79.41 97.50 0.875 
Table 2: Results from adding pseudo-examples 
to the initial training set with 35 documents. 
of-the-art regarding similar MUC competition 
tasks. 

References 
Eneko Agirre and German Rigau. 1995. A Proposal 
for Word Sense Disambiguation using Concep- 
tual Distance. In Proceedings of the International 
Conference RANLP, Tzigov Chark, Bulgaria. 
L. Breiman. 1998. Arcing Classifiers. The Annals 
of Statistics, 26(3):801-849. 
M.E. Califf and R. Mooney. 1997. Relational learn- 
ing of pattern-match rules for information extrac- 
tion. In Workshop on Natural Language Learning, 
pages 9-15. ACL. 
D. Freitag. 1998. Machine Learning for Informa- 
tion Extraction in Informal Domains. Ph.D. the- 
sis, Computer Science Department. Carnegie Mel- 
lon University. 
S. Huffman. 1996. Learning information extraction 
patterns from examples. In S. Wermter, E. Riloff, 
and G. Sheller, editors, Connectionist, statistical 
and symbolic approaches to learning for natural 
 processing. Springer-Verlag. 
H. Ko. 1998. Empirical assembly sequence planning: 
A multistrategy constructive learning approach. 
In I. Bratko R. S. Michalsky and M. Kubat, ed- 
itors, Machine Learning and Data Mining. John 
Wiley & Sons LTD. 
R.S. Michalshi. 1993. Towards a unified theory of 
learning: Multistrategy task-adaptive learning. 
In B.G. Buchanan and D. Wilkins, editors, Read- 
ings in Knowledge Acquisition and Learning. Mor- 
gan Kauffman. 
J.R. Quinlan. 1990. Learning logical definitions 
from relations. Machine Learning, 5:239-266. 
S. Soderland, D. Fisher, J. Aseltine, and W. Lehn- 
ert. 1995. Crystal: Inducing a conceptual dictio- 
nary. In XIV International Joint Conference on 
Artificial Intelligence, pages 1314-1321. 
S. Soderland. 1999. Learning information extraction 
rules for semi-structured and free text. Machine 
Learning, 34:233-272. 
J. Turmo, N. Catalk, and H. Rodrlguez. 1999. An 
adaptable ie system to new domains. Applied In- 
telligence, 10(2/3):225-246. 
