The Use of Lexical Semantics in Information Extraction * 
Joyce Yue Chai Alan W. Biermann 
Department of Computer Science 
Box 90129, Duke University 
Durham, NC 27708-0129 
chai@cs.duke.edu awb~cs.duke.edu 
Abstract 
This paper presents a method for en- 
abling users to specialize an information 
extraction system to satisfy their particular 
needs. The method allows the user to man- 
ually demonstrate the creation of seman- 
tic nodes and transitions while scanning a 
sample text article using a graphical user 
interface. On the basis of such examples, 
the system creates rules that translate text 
to semantic nets; then it generalizes these 
rules so that they can apply to a broad class 
of text instead of only the training articles. 
Finally, the generalized rules are used to 
scan large numbers of articles to extract 
the particular information targeted by the 
user. This paper concentrates on our de- 
sign of the generalization mechanism which 
self modifies to precisely match the user's 
specification. 
1 Introduction 
Customizing information extraction systems across 
different domains has become an important is- 
sue in Natural Language Processing. Many re- 
search groups are making progress toward efficient 
customization, such as BBN (Weischedel, 1995), 
NYU (Grishman, 1995), SRI (Appelt et al., 1995), 
SRA (Krupka, 1995), MITRE (Aberdeen et al., 
1995), UMass (Fisher et al., 1995)...etc. SRI 
developed a specification language called FAST- 
SPEC that automatically translates regular produc- 
tions written by the developer into finite state ma- 
chines (Appelt et al., 1995). FASTSPEC makes the 
customization easier by avoiding the effort in enu- 
merating all the possible ways of expressing the tar- 
get information. The HASTEN system developed at 
~This work has been supported by a Fellowship from 
IBM Corporation. 
SRA (Krupka, 1995) employs a graphical user inter- 
face that allows the user to create patterns by iden- 
tifying the important concepts in the text, as well 
as the relationships between the concepts. Then the 
concepts are manually generalized to word classes 
before the patterns are applied to other texts from 
the domain. 
We have built a trainable information extraction 
system that enables any user to adapt the system to 
different applications. The trainability of the system 
provides users the ability to identify the patterns for 
the information of interest. The training process is 
similar to the HASTEN system. However, instead 
of manual generalization as in HASTEN, our system 
automatically generalizes patterns by use of Word- 
Net hierarchies. Automatic generalization of rules 
makes the customization process an easier one. 
This paper describes the automated rule general- 
ization method and the usage of WordNet (Miller, 
1990) in our system. First, it introduces the idea 
of generalization; then it describes our Generaliza- 
tion Tree (GT) model based on the WordNet and 
illustrates how GT controls the degree of generaliza- 
tion according to the user's needs. Finally it demon- 
strates some preliminary results from the experiment 
of applying GT in our trainable information extrac- 
tion system. 
2 Lexical Acquisition 
One way to achieve lexical acquisition is to use the 
existing repositories of lexical knowledge, such as 
knowledge base, dictionaries and thesauruses. The 
key issue is whether those repositories can be effec- 
tively applied for the computational purpose. Many 
researchers have taken steps toward successful ex- 
traction of computationally useful lexical informa- 
tion from machine readable dictionaries and con- 
vert it into formal representation (Montemagnia and 
Vanderwende, 1993) (Byrd et al., 1987) (Jensen 
and Binot, 1987). Sparck Jones's pioneering re- 
61 
search (Jones, 1985), done in early 1960, proposed 
a lexical representation by synonym list. Very close 
to that proposal, George Miller and colleagues at 
Princeton University constructed a large-scale re- 
source for lexical information-WordNet. 
The most useful feature of WordNet to Natural 
Language Processing community is the organization 
of lexical information in terms of word meanings, 
rather than word forms. It is organized by parts of 
speech-nouns, verbs, adjectives, and adverbs. Each 
entry in the WordNet is a concept represented by a 
list of synonyms (synset). The information is rep- 
resented in the form of semantic networks. For in- 
stance, in the network for nouns, there are "part 
of", "is_a', "member of" .... relationships between 
concepts. Philip Resnik has studied the lexical rela- 
tionship by use of a WordNet taxonomy. He wrote 
that "...it is difficult to ground taxonomic represen- 
tations such as WordNet in precise formal terms, 
the use of the WordNet taxonomy makes reason- ~ 
ably clear the nature of the relationships being rep- 
resented. " (Resnik, 1993). 
Some early work of applying WordNet for the lex- ~ 
ical semantic acquisition can be found in NYU's 
MUC-4 system (Grishman et al., 1992), which c 
used WordNet hierarchies for semantic classification. 
However, they ran into the problem of automated 
sense disambiguation because the WordNet hierar- 
chy is sense dependent. Ralph Grishman and his 
group at NYU reached the conclusion that "Word- 
Net may be a good source of concepts, but that it 
will not be of net benefit unless manually reviewed 
with respect to a particular application" (Grish- 
man et al., 1992). Other research concerns using 
WordNet senses to tag large corpus with the lexi- 
cal semantics for automated word sense disambigua- 
tion (Ng, 1997) (Wiebe et al. , 1997) 
3 Application of WordNet in the 
System 
Our system contains three major processes which, 
respectively, address training, rule generalization, 
and the scanning of new information. WordNet is 
used in all three processes as shown in figure 1. 
During the training process, each article is par- 
tially parsed and segmented into Noun Phrases, Verb 
Phrases and Prepositional Phrases. An IBM Lan- 
guageWare English Dictionary and Computing Term 
Dictionary, a Partial Parser, a Tokenizer and a Pre- 
processor are used in the parsing process. The To- 
kenizer and the Preprocessor are designed to iden- 
tify some special categories such as e-mail address, 
phone number, state and city etc. The user, with 
---------Braining Process 
\[ Scanning Process \] 
Figure 1: The Use of WordNet in the System 
1200 
1000 
800 
600 
400 
200 
0 
I I I | 
sense dtstnbutton 
2 3 4 5 6 
sense number 
I I 7 8 
Figure 2: The Sense Distribution 
the help of a graphical user intefface(GUI) scans a 
parsed sample article and indicates a series of se- 
mantic net nodes and transitions that he or she 
would like to create to represent the information 
of interest. Specifically, the user designates those 
noun phrases in the article that are of interest and 
uses the interface commands to translate them into 
semantic net nodes. Furthermore, the user desig- 
nates verb phrases and prepositions that relate the 
noun phrases and uses commands to translate them 
into semantic net transitions between nodes. In the 
process, the user indicates the desired translation of 
the specific information of interest into semantic net 
form that can easily be processed by the machine. 
When the user takes the action to create the seman- 
tic transitions, a Rule Generator keeps track of the 
user's moves and creates the rules automatically. 
WordNet is used to provide the sense informa- 
tion during the training. For each headword in 
a noun/verb phrase, many senses are available in 
WordNet. We trained 24 articles with 1129 head- 
62 
words from "triangle.job" domain, and found that 
91.7% of headwords were used as sense one in Word- 
Net. The sense distribution is shown in figure 2. 
Based on this observation, by default, the system 
assigns sense one to every headword, while provid- 
ing the user the option to train the sense other than 
one. For example, "opening" often appears in the 
job advertisement domain. But instead of using the 
first sense as {opening, gap}, it uses the fourth sense 
as {opportunity, chance}. The user needs to train 
"opening" to be sense four during the training pro- 
cess. The Sense Table keeps the record of these head- 
words and their most frequently used senses (other 
than one). 
Rules created from the training process are spe- 
cific to the training articles and must be generalized 
before being applied on other articles in the domain. 
According to different requirements from the user, in 
the rule generalization process, a rule optimization 
engine, based on WordNet, generalizes the specific 
rules and forms a set of optimized rules for process- 
ing new information. This rule generalization pro- 
cess will be described in the later sections. 
During the scanning of new information, with the 
help of a rule matching routine, the system applies 
the optimized rules on a large number of unseen ar- 
ticles from the domain. If headwords are not in the 
Sense Table, sense one in WordNet will be assigned; 
otherwise, the Sense Table provides them their most 
frequently used senses in the domain. The output of 
the system is a set of semantic transitions for each 
article that specifically extract information of inter- 
est to the user. Those transitions can then be used 
by a Postprocessor to fill templates, answer queries, 
or generate abstracts. 
4 Rule Generalization 
The Rule Generalization engine is crucial to the 
whole system because it makes the customizing pro- 
cess easier. The user only needs to train on a compa- 
rably small amount of data from the domain, and the 
system will automatically revise the rules to make 
them applicable for large amount of new informa- 
tion. 
4.1 Rules 
In a typical information extraction task, the most in- 
teresting part is the events and relationships holding 
among the events (Appelt et al., 1995). These rela- 
tionships are usually specified by verbs and preposi- 
tions. Based on this observation, the left hand side 
(LHS) of our information extraction rules is made 
up of three entities. The first and the third entities 
are the target objects in the form of noun phrases, 
the second entity is the verb or prepositional phrase 
indicating the relationship between the two objects. 
The right hand side (RHS) of the rule consists of the 
operations required to create a semantic transition- 
ADD.NODE, ADD_RELATION. ADD.NODE is to 
add an object in the transitions. ADD.RELATION 
is to add a relationship between two objects. The 
specific rule generated from the training process is 
shown in figure 3 rule 1. 
Rule 1 in figure 3 is very specific, and it can be 
activated only by a sentence with the same pattern 
as "DCR Inc. is looking for C programmers... ". 
It will not be activated by other Sentences such as 
"IBM Corporation seeks job candidates in Louisville, 
KY with HTML experience". Semantically speak- 
ing, these two sentences are very much alike. Both 
of them are about a company that seeks some kind 
of person. However, without generalization, the sec- 
ond sentence will not be processed. So the use of the 
specific rule is very limited. 
In order to make the specific rules applicable to 
a large number of unseen articles in the domain, 
a comprehensive generalization mechanism is nec- 
essary. We are not only interested in the general- 
ization itself, but also in a strategy to control the 
degree of generalization for various applications in 
different domains. 
4.2 Generalization Scheme 
The hierarchical organization of WordNet by word 
meanings (Miller, 1990) provides the opportu- 
nity for automated generalization. With the large 
amount of information in semantic classification and 
taxonomy provided in WordNet, many ways of in- 
corporating WordNet semantic features with gener- 
alization are foreseeable. At this stage, we only con- 
centrate on the Hypemym/Hyponym feature. 
A hyponym is defined in (Miller et al., 1990a) as 
follows: " A noun X is said to be a hyponym of a 
noun Y if we can say that X is a kind of Y. This 
relation generates a hierarchical tree structure, i.e., 
a taxonomy. A hyponym anywhere in the hierarchy 
can be said to be "a kind of" all of its superordi- 
nateds .... " If X is a hyponym of Y, then Y is a 
hypernym of X. 
From the training process, the specific rules con- 
tain three entities on the LHS. An abstract specific 
rule is shown in rule 2 in figure 3. Each entity (sp) 
is a quadruple, in the form of (w, c, s, t), where w is 
the headword of the trained phrase; c is the part of 
the speech of the word; s is the sense number rep- 
resenting the meaning of w; t is the semantic type 
identified by the preprocessor for w. 
For each sp = (w,c,s,t), if w exists in WordNet, 
63 
1. An Example of the Specific Rule: 
\[DCR Inc, NG, 1,company\], \[look.for, VG, 1, other_type\], \[programmer, NG, 1, other_type\] 
ADD.NODE(DCR Inc.), ADD_NODE(programmer), 
ADD_RELATION(look.for, DCR Inc., programmer) 
2. An Abstract Specific Rule: 
(Wl, el, 81, tl), (W2, C2, 82, t2),(w3, c3, 83, t3) 
ADD_NODE(w1), ADD_NODE(w2), ADD_RELATION(w~, w2, w3) 
3. An Abstract Generalized Rule: 
(W1, C1, $1, T1) E Generalize( spl , hl ) , (W2 , C2, $2, T2 ) E Generalize( sp2, h2 ), 
(W~, C3, Ss, T3) E Generalize(sp3, h3) 
ADD_NODE(W1), ADD_NODE(Ws), ADD_RELATION(W2,Wi, W3) 
4. An Example of the Most General Rule: 
(W1, C1, $1, T1) E {group...}, (W2, C2, $2, T2) e {look_for, ...}, (W3, C3, $3, T3) E {entity, ...}) 
ADD_NODE(W1 ), ADD.NODE(W3), ADD_RELATION(Wz,W1, W3) 
Figure 3: Sample Rules 
then there is a corresponding synset in WordNet. 
The hyponym/hypernym hierarchical structure pro- 
vides a way of locating the superordinate concepts of 
sp. By following additional Hypernymy, we will get 
more and more generalized concepts and eventually 
reach the most general concept, such as {entity}. 
Based on this scenario, for each concept, different 
degrees of generalization can be achieved by adjust- 
ing the distance between this concept and the most 
general concept in the WordNet hierarchy (Bagga et 
al., 1997). The function to accomplish this task is 
Generalize(x,h), which returns a synset list h levels 
above the concept z in the hierarchy. 
WordNet is an acyclic structure, which suggests 
that a synset might have more than one hypernym. 
However, This situation doesn't happen often. We 
tested on 150 randomly chosen articles from "tri- 
angle.job" newsgroup. Totally there were 12115 
phrases including 1829 prepositions, 1173 phrases 
with headwords not in WordNet and 9113 phrases 
with headwords in WordNet. Within 9113 head- 
words, 722 headwords (7.9%), either themselves or 
their hypernym had more than one superordinate. 
Furthermore, 90% of 722 cases came from two su- 
perordinates of {person, individual, someone, moral, 
human soul}, which are {life_form, organism, be- 
ing, living thing}, and {causal agent, cause, causal 
agency}. Certainly, in some cases, {person...} is a 
kind of {causal agent...}, but identifying it as hy- 
ponym of {life_form...} also makes the sense. Based 
on this scenario, for the sake of simplicity, the sys- 
tem selects the first superordinate if more than one 
are presented. 
The process of generalizing rules consists of re- 
placing each sp = (w,c,s,t) in the specific rules by 
a more general superordinate synset from its hy- 
pernym hierarchy in WordNet by performing the 
Generalize(s, h) function. The degree of general- 
ization for rules varies with the variation of h in 
Generalize( sp, h ). 
Rule 3 in figure 3 shows an abstract generalized 
rule. The E symbol signifies the subsumption rela- 
tionship. Therefore, a E b signifies that a is sub- 
sumed by b, or, in WordNet terms, concept b is a 
superordinate concept of concept a. The generalized 
rule states that the RHS of the rule gets executed if 
64 
Specific Rule k @ 
\[ Most General Rule 1 
Transmu~ 
Database for \] 
Acuvatmg Objects 
\[~ User's Reqmrement 
for Pmclsmn (threshold) 
(Opt1 edRule 1 
Figure 4: Rule Generalization Process 
all of the following conditions hold: 
• A sentence contains three phrases (not neces- 
sarily contiguous) with headwords W1, W2, and 
Ws. 
* The quadruples corresponding to these head- 
words are (W1, C1, $1, T1), (W2, U2, $2, T2), and 
(Ws, Cs, S3, T3). 
• The synsets, in WordNet, corresponding to the 
quadruples, are subsumed by Generalize(spl, 
hi), Generalize(sp~, h2), and Generalize(sps, 
h3 ) respectively. 
5 Generalization Tree 
The generalization degree is adjustable by the user. 
Rules with different degrees of generalization on 
their different constituents will have a different be- 
havior when processing new texts. Within a par- 
ticular rule, the user might expect one entity to be 
relatively specific and the other entity to be more 
general. For example, if a user is interested in find- 
ing all DCR Inc. related jobs, he/she might want 
to hold the first entity as specific as that in rule 1 
in figure 3, and generalize the third entity. We have 
designed a Generalization Tree (GT) to control the 
generalization degree. 
The rule generalization process with the help of 
GT is illustrated in figure 4. Each specific rule(as 
shown in rule 1 in figure 3) is generalized to its most 
general form(as shown in rule 4 in figure 3) by a 
generalization engine based on WordNet. Specifi- 
cally, the generalization engine generalizes noun en- 
tities in the specific rule to their top hypernym in 
the hierarchies. The most general rule is applied 
again to the training corpus and some transitions 
are created. Some transitions are relevant, while 
others are not. Then the user employs our system to 
classify the created transitions as either acceptable 
or not. The statistical classifier calculates the rele- 
vancy_rate for each object, which will be described 
later. A database is maintained to keep the rele- 
vancy information for all the objects which activate 
the most general concept in the most general rule. 
This database is later automatically transformed to 
the GT structure. While maintaining the semantic 
relationships of objects as in WordNet, GTs collect 
the relevancy information of all activating objects 
and find the optimal level of generalization to fit 
the user's needs. The system will automatically ad- 
just the generalization levels for each noun entity to 
match the desires of the user. The idea of this op- 
timization process is to first keep recall as high as 
possible by applying the most general rules, then ad- 
just the precision by tuning the rules based on the 
user's specific inputs. 
5.1 An Example of GT 
Suppose we apply rule 4 in figure 3 to the train- 
ing corpus, and the entity three in the rule is acti- 
vated by a set of objects shown in table 1. From 
a user interface and a statistical classifier, the rele- 
vancy_rate(re0 for each object can be calculated. 
rel(obj) = count of obj being relevant 
total count of occurence of obj 
As shown in table 1, for example, rel({analyst...}) = 
80%, which indicates that when (entity} in the 
most general rule is activated by analyst, 80% of 
time it hits relevant information and 20% of time 
it hits irrelevant information. On the other hand, 
it suggests that if { entity} is replaced by the con- 
cept (analyst...}, a roughly 80% precision could be 
achieved in extracting the relevant information. The 
corresponding GT for table 1 is shown in figure 5. 
In GT, each activating object is the leaf node in 
the tree, with an edge to its immediate hypernym 
(parent). For each hypernym list in the database, 
65 
object sense 
analyst 1 
candidate 2 
individual 1 
participant 1 
professional 1 
software 1 
hypernym list depth rel_rate count 
{analyst} ~ {expert} ~ {individual} 4 80% 5 
=~ {life form} ~ {entity} 
{candidate} ~ {applicant} =~ {individual} 4 100% 3 
{life form} ~ {entity} 
{individual} ~ {life form} =~ {entity} 2 100% 5 
5 0% 1 
4 100% 2 
5 0% 1 
... •.. ... 
Table 1: Sample Database for Objects Activating {entity} 
{Enuty} c = 17, r =82 3% 
{object } {hfeform, orgamsm } c=16, r=SZ5% J 
{artifact } {person, individual } c = 16, r = 875% 
r=0% 
{crcauon } {~pen } {adult } {peer, equal } (apphcant} {,adtwdual} 
r--O% r=80% r=100% r=0% r=100% rd=lO0% 
{producUon ) {analyst} {professlonal} {assocta~ } {candidate} 
c = 1 count = 5 count = 2 \[ c = 1 count = 3 
r = 0% rel = 80% rel = 100% I r = 0% rel = 100% 
{software} {partmtpant} 
Coun$' -~ 1 count = I 
rel = 0% rel = 0% 
Figure 5: An Example of Generalization Tree 
there is a corresponding path in GT. Besides the or- 
dinary hard edge represented by the solid line, there 
is a soft edge represented by the dotted line. Con- 
cepts connected by the soft edge are the same con- 
cepts. The only difference between them is that the 
leaf node is the actual activating object, while the 
internal node is the hypernym for other activating 
objects. Hard edges and soft edges only have rep- 
resentational difference, as to the calculation, they 
are treated the same. Each node has two other fields 
counts of occurrence and relevancy_rate. For the leaf 
nodes, those fields can be filled from the database di- 
rectly. For internal nodes, the instantiation of these 
fields depends on its hyponym (children) nodes. The 
calculation will be described later. 
If the relevancy.xate for the root node { entity} is 
82.3%, it indicates that, with the probability 82.3%, 
objects which activate { entity} are relevant. If the 
user is satisfied with this rate, then it's not neces- 
sary to alter the most general concept in the rule. 
If the user feels the estimated precision is too low, 
the system will go down the tree, and check the rel- 
evancy.xate in the next level. For example, if 87.5% 
is good enough, then the concept {life form, organ- 
ism...} will substitute {entity} in the most general 
rule. If the precision is still too low, the system will 
go down the tree, find {adult..}, {applicant..}, and 
replace the concept { entity} in the most general rule 
with the union of these two concepts. 
5.2 Generalization Tree Model 
specific 
... 
more general 
most general 
object 1 relation object 2 
T0 
Xn 
Y0 
..°• 
Table 2: concepts in the rule 
Z0 
Z 3 
Zm 
For the sake of simplicity, let's use x,, y,, zl to repre- 
sent the rule constituents- object one, relation, ob- 
66 
ject two respectively. As shown in table 2, xo, yo, 
z0 are the concepts from the specific rule. At the 
moment, we only consider the generalization on the 
objects, zs and z~ are more general concepts than 
x0 and z0. x~ is the hypemym of xz-1 (i _< n); z: is 
the hypernym of za-1 (j _< m). Xn and Zm are the 
most general concepts for object one and object two 
respectively. 
For each object concept, a corresponding GT is 
created. Let's suppose xn is activated by q concepts 
el °, e2 °, .... e°q; the times of activation for each e~ ° are 
represented by c~. Since e~°(i _< q) activates xn, there 
o ~ e~ =~ .... =~ xn in Word- exists a hypernym list e, 
Net, where e~ is the immediate hypernym of e~ -1. 
The system maintains a database of activation in- 
formation as shown in table 3, and builds GT from 
this database automatically. 
GT is an n-ary branching tree structure with the 
following properties: 
• Each node represents a concept, and each edge 
represents the hypernym relationship between 
the concepts. If e~ is the immediate hypernym 
of ca, then there is an edge between node e~ and 
ej. e~ is one level above ea in the tree. 
• The root node xn is the most general concept 
from the most general rule. 
• The leaf nodes .o ~0 .o el, ~2,...~q are the concepts 
which activate xn. The internal nodes are the 
concepts e~ (i ~ 0 and 1 < j < q) from the 
hypernym paths for the activating concepts. 
o has three fields-concept it- , Every leaf node e~ 
self e~ ° , counts and relevancy_rate, which can be 
obtained from the database: 
counts(e~ °) = c, 
relevancy_rate( e~ °) = r~ 
• Every internal node e has three fields-concept 
itself e, relevancy_rate and counts(e). 
For an internal node e, if it has n hyponyms 
eo, ...en then: 
n 
counts(e) = couNts(e,) 
s=l 
relevancy_rate(e) = ~ P ( e~) *relevancy_rate( e~ ) 
2=1 
where 
P(e,) = counts(e,) 
counts(e) 
5.3 Searching GT 
Depending on user's different needs, a threshold 9 
is pre-selected. The system will start from the root 
node, go down the tree, and find all the nodes e, 
such that relevancy.rate(e~) > 0. If a node rele- 
vancy_rate is higher than 9, its hyponym (children) 
nodes will be ignored. In this way, the system main- 
tains a set of concepts whose relevancy_rate is higher 
than 8. By substituting xn in the most general rule 
with this set of concepts, an optimized rule is created 
to meet the user's needs. 
The searching algorithm is basically the breadth- 
first search as follows: 
1. Initialize Optimal-Concepts to be empty set. 
Pre-select the threshold 0. If the user wants 
to get the relevant information and particularly 
cares about the precision, 0 should be set high; 
if the user wants to extract as much as infor- 
mation possible and does not care about the 
precision, 0 should be set low. 
2. Starting from the root node x, perform the 
Recursive-Seareh algorithm, which is defined as 
the following: 
Recursive-Seareh(coneept x) 
{ i/(ret(x) > O) { 
put x into Optimal-Concepts set; 
exit; ) 
else { 
let m denote the number of children nodes of x; 
let x, denote the child of x (0 < i _< m); 
for (i = 1; i < m;i++) 
Recursive-Seareh(x, ) ; ); 
} 
5.4 Experiment and Discussion 
An experiment is conducted to test the applicabil- 
ity of GT in automatic information extraction. We 
trained our system on 24 articles from the trian- 
gle.jobs USENET newsgroups, and created 25 spe- 
cific rules concerning the job position/title informa- 
tion. For example, in "DCR. is looking for software 
engineer", software engineer is the position name. 
The specific rules then were generalized to their most 
general forms, and were applied again to the training 
set. After the user's selection of the relevant tran- 
sitions, the system automatically generated a GT 
for each most general concept in the most general 
rule. We predefined the threshold to be 0.2, 0.4, 0.5, 
0.6, 0.8, 0.9 and 1.0. Based on the different thresh- 
olds, the system generated different sets of optimized 
67 
activating objects 
cy 
sense counts hypernym list 
Sl C1 e~ ~ e~ ~ ... ~ ~gn 
... ... .... 
8q Cq eq eq ... 
depth 
dz 
d2 
dq 
relevancy_rate 
rl 
r2 
rq 
Table 3: database of activating concepts 
rules. Those rules were then applied on 85 unseen 
articles from the domain. 
The evaluation process consists of the following 
step: first, each unseen article is studied to see if 
there is position/title information presented in the 
article; second, the semantic transitions produced 
by the system are examined to see if they correctly 
extract the position/title information. Precision is 
the number of transitions created which containing 
position/title information out of the total number 
of transitions produced by the system; recall is the 
number of articles which have been correctly ex- 
tracted position/title information out of the total 
number of articles with position/title information. 
The overall performance of recall and precision is 
defined by F-measurement (Chinchor, 1992), which 
is (/~2 + 1.0) • P • R 
/32*P+R 
where P is precision, R is recall,/~ = 1 if precision 
and recall are equally important. The precision , 
recall and F-measurement curves with respect to the 
threshold for relevancy_rate are shown in figure 6. 
The detailed result is shown in table 4. 
Q. 
1 
0.9 
08 
07 
06 
0.5 
04 
03 
0.2 
0.1 
0 
0.2 
i i t t t s I 
........ f~li" ......... 
• .F-~asummenl .... 
_ ,,. .... orec,sJon ..... ilfil;i::::?" 
I I I I I I I 
0 3 0 4 0 5 0.6 0.7 0.8 0.9 threshold for the relevancy_rate 
Figure 6: recall, precision, and F-measurement vs. 
threshold 
The recall achieves the highest at 81.3% when 0 = 
0.2. It gradually declines and reaches 66.7% when 
8 = 1.0. As expected, the precision increases when 
0 goes up. It ranges from very low at 33.3% (0 = 
0.2) to very high at 94.6%( 0 = 1.0). The overall 
performance F-measurement goes up from 4?.2% to 
78.7% when 0 increases. The result is consistent 
with our expectation. When the threshold is high, 
more tuning of the rules needs to be done, and the 
system is expected to perform better. 
Some problems were detected which prevent bet- 
ter performance of the system. The current do- 
main is a newsgroup, where anyone can post any- 
thing which he/she believes is relevant to the news- 
group. It is inevitable that some typographical er- 
rors and some abbreviations occur in the articles. 
And the format of the article sometimes is unpre- 
dictable. The system performance is also hurt by 
the error in the partial parsing. 
In the experiment, we found that WordNet has 
about 90% coverage of verbs and nouns in this do- 
main. Most nouns not in WordNet are proper nouns, 
and in this domain, mostly are company names, soft- 
ware names. This problem is solved by our Pre- 
processor, which identifies the proper nouns to be 
several semantic types, such as company name, soft- 
ware name, city name, and so on. However Some 
important domain specific nouns may not exist in 
WordNet. It would be nice if WordNet could pro- 
vide the friendly interface for users to add the new 
words and create the links for their own applications. 
As to computational purpose, WordNet is well devel- 
oped. Finding hypernym, synonyms...etc is very effi- 
cient. Training senses at the training process solves 
the most problems of sense disambiguation. How- 
ever, some problems still remain. For example, if 
"better" is not trained in the training process, then 
by default, it will be assigned sense one, which is 
a subtype of a person. The hypernym list of "bet- 
ter" with sense one is {better} =~ {superior} 
{religion} ~ {Religionist} =~ {person}. But 
in the sentence "This position requires experience 
with 5.0 or better", "better" should be used as 
sense two as in the hypernym list {better} =~ 
{good, goodness} :-~ {asset, plus) ~ {quality} 
{attribute} ~ {abstraction} . Despite occasional 
sense disambiguation problem, generally, WordNet 
provides a good method to achieve generalization in 
68 
threshold 0.2 0.4 
Precision 33.3% 49.7% 
Recall 81.3% 80.0% 
F-measurement 47.2% 61.3% 
0.5 
51.0% 
80.0% 
62.3% 
0.6 0.8 0.9 1.0 l 
82.4% 94.6% 94.6% 94.6% 
73.3% 66,7% 66.7% 66.7% 
77.6% 78.7% 78.7% 78.7% 
Table 4: Precision/Recall/F-measurement wrt. threshold of relevancy_rate 
this domain. 
6 Conclusion and Future Work 
This paper describes a rule generalization approach 
by using Generalization Tree and WordNet for infor- 
mation extraction. The rule generalization makes 
the customization process easier. The Generaliza- 
tion Tree algorithm provides a way to make the sys- 
tem adaptable to the user's needs. The idea of first 
achieving the highest recall with low precision, then 
adjusting precision by user's needs has been success- 
ful. We are currently studying how to enhance the 
system performance by further refining the general- 
ization approach. 

References 
Aberdeen,John, John Burger, David Day, Lynette 
Hirschman, Patricia Robinson, and Marc Vi- 
lain 1995. MITRE: Description of the ALEM- 
BIC System Used for MUC-6, Proceedings of the 
Sixth Message Understanding Conference (MUC- 
6), pp. 141-155, November 1995. 
Appelt, Douglas E., Jerry It. Hobbs, John Bear, 
David Israel, Megumi Kameyama, Andy Kehler, 
David Martin, Karen Myers, and Mabry Tyson 
1995. SRI International: Description of the FAS- 
TUS System Used for MUC-6, Proceedings of the 
Sixth Message Understanding Conference (MUC- 
6), pp. 237-248, November 1995. 
Bagga, Amit, Joyce Y. Chai, and Alan W. Biermann 
1997 The Role of WordNet in the Creation of 
a Trainable Message Understanding System, To 
appear at The Innovative Applications of Artificial 
Intelligence Conference, 1997. 
Byrd, Roy, Nicoletta Calzolari, Martin Chodorow, 
Judith Klavans, and Mary Neff 1987 Tools and 
methods for computational linguistics, Computa- 
tional Linguistics, 13(3-4), pp. 219-240, 1987 
Chai, Joyce Y.and Alan W. Biermann 1997 A 
WordNet Based Rule Generalization Engine For 
Meaning Extraction Submitted to Tenth Interna- 
tional Symposium On Methodologies For Intelli- 
gent Systems, 1997. 
Chinchor, Nancy 1992. MUC-4 Evaluation Metrics, 
Proceedings of the Fourth Message Understand- 
ing Conference (MUC-J), June 1992, San Mateo: 
Morgan Kanfmann. 
Fisher, David, Stephen Soderland, Joseph Mc- 
Carthy, Fangfang Feng and Wendy Lehnert. 1995. 
Description of the UMass System as Used for 
MUC-6, Proceedings of the Szxth Message Un- 
derstanding Conference (MUC-6), pp. 127-140, 
November 1995. 
Grishman, Ralph, Catherine Macleod, and John 
Sterling 1992. New York University Descrip- 
tion of the Proteus System as Used for MUC-4, 
Proceedings of the Fourth Message Understanding 
Conference (MUC-d), pp. 223-241, June 1992. 
Grishman, Ralph 1995. The NYU System for MUC- 
6 or Where's the Syntax? Proceedings of the 
Sixth Message Understanding Conference (MUC- 
6), pp. 167-175, November 1995. 
Jensen, Karen, Jean-Louis Binot 1987. Disam- 
biguating Prepositional Phrase Attachments by 
Using On-line Dictionary Definitions Computa- 
tional Linguistics, 13(3) pp. 251-260, 1987. 
Jones, Sparck 1985. Synonymy and Semantic Clas- 
sification, Edinburgh University Press, 1985 
Krupka, George It. 1995. Description of the SRA 
System as Used for MUC-6, Proceedings of the 
Sixth Message Understanding Conference (MUC- 
6), pp. 221-235, November 1995. 
Miller, George A. 1990. Introduction to WordNet: 
An On-Line Lexical Database. WordNet Manuals, 
pp. 10-32, August 1993. 
Miller, George A., et al. 1990a. Five Papers on 
WordNet, Cognitive Science Laboratory, Prince- 
ton University, No. 43, July 1990. 
Montemagni, Simonetta, Lucy Vanderwende 1993 
Structural Patterns versus String Patterns for Ex- 
tracting Semantic Information from Dictionaries, 
Natural Language Processing: The PLNLP Ap- 
proach, pp. 149-159, 1993 
Ng, Hwee Tou 1997 Getting Serious about Word 
Sense Disambiguation, A CL/SIGLEX Workshop 
on Tagging Text with Lexical Semantics, pp. 1-7, 
Washington DC, April, 1997 
Resnik, Philip 1993 Selection and Information: A 
Class Based Approach to Lexical Relationships, 
Ph.D Dissertation, University of Pennsylvania, 
1993. 
Weischedel, Ralph 1995. BBN: Description of the 
PLUM System as Used for MUC-6, Proceedings 
of the Sixth Message Understanding Conference 
(MUC-6), pp. 55-69, November 1995. 
Wiebe, Janyce, Julie Maples, Lei Duan, and Re- 
becca Bruce 1997 Experience in WordNet Sense 
Tagging in Wall Street Journal, ACL/SIGLEX 
Workshop on Tagging Text with Lexical Seman- 
tics, pp. 1-7, Washington DC, April, 1997 
