Domain-Specific Knowledge Acquisition from Text 
Dan Moldovan, Roxana Girju and Vasile Rus 
Department of Computer Science and Engineering 
University of Southern Methodist University 
Dallas, Texas, 75275-0122 
{ moldovan, roxana, rus} @seas.smu.edu 
Abstract 
In many knowledge intensive applications, it is nec- 
essary to have extensive domain-specific knowledge 
in addition to general-purpose knowledge bases. 
This paper presents a methodology for discovering 
domain-specific concepts and relationships in an at- 
tempt to extend WordNet. The method was tested 
on five seed concepts selected from the financial 
domain: interest rate, stock market, inflation, eco- 
nomic growth, and employment. 
1 Desiderata for Automated 
Knowledge Acquisition 
The need for knowledge 
The knowledge is infinite and no matter how large a 
knowledge base is, it is not possible to store all the 
concepts and procedures for all domains. Even if 
that were possible, the knowledge is generative and 
there are no guarantees that a system will have the 
latest information all the time. And yet, if we are to 
build common-sense knowledge processing systems 
in the future, it is necessary to have general-purpose 
and domain-specific knowledge that is up to date. 
Our inability to build large knowledge bases without 
much effort has impeded many ANLP developments. 
The most successful current Information Extrac- 
tion systems rely on hand coded linguistic rules rep- 
resenting lexico-syntactic patterns capable of match- 
ing natural  expressions of events. Since 
the rules are hand-coded it is difficult to port sys- 
tems across domains. Question answering, inference, 
summarization, and other applications can benefit 
from large linguistic knowledge bases. 
The basic idea 
A possible solution to the problem of rapid develop- 
ment of flexible knowledge bases is to design an au- 
tomatic knowledge acquisition system that extracts 
knowledge from texts for the purpose of merging it 
with a core ontological knowledge base. The attempt 
to create a knowledge base manually is time con- 
suming and error prone, even for small application 
domains, and we believe that automatic knowledge 
acquisition and classification is the only viable solu- 
tion to large-scale, knowledge intensive applications. 
This paper presents an interactive method that ac- 
quires new concepts and connections associated with 
user-selected seed concepts, and adds them to the 
WordNet linguistic knowledge structure (Fellbaum 
1998). The sources of the new knowledge are texts 
acquired from the Internet or other corpora. At the 
present time, our system works in a semi-automatic 
mode, in the sense that it acquires concepts and re- 
lations automatically, but their validation is done by 
the user. 
We believe that domain knowledge should not be 
acquired in a vacuum; it should expand an existent 
ontology with a skeletal structure built on consistent 
and acceptable principles. The method presented in 
this paper is applicable to any Machine Readable 
Dictionary. However, we chose WordNet because it 
is freely available and widely used. 
Related work 
This work was inspired in part by Marti Hearst's 
paper (Hearst 1998) where she discovers manually 
lexico-syntactic patterns for the HYPERNYMY rela- 
tion in WordNet. 
Much of the work in pattern extraction from texts 
was done for improving the performance of Infor- 
mation Extraction systems. Research in this area 
was done by (Kim and Moldovan 1995) (Riloff 1996), 
(Soderland 1997) and others. 
The MindNet (Richardson 1998) project at Mi- 
crosoft is an attempt to transform the Longman Dic- 
tionary of Contemporary English (LDOCE) into a 
form of knowledge base for text processing. 
Woods studied knowledge representation and clas- 
sification for long time (Woods 1991), and more re- 
cently is trying to automate the construction of tax- 
onomies by extracting concepts directly from texts 
(Woods 1997). 
The Knowledge Acquisition from Text (KAT) sys- 
tem is presented next. It consists of four parts: (1) 
discovery of new concepts, (2) discovery of new lex- 
ical patterns, (3) discovery of new relationships re- 
flected by the lexical patterns, and (4) the classifi- 
cation and integration of the knowledge discovered 
with a WordNet - like knowledge base. 
268 
2 KAT System 
2.1 Discover new concepts 
Select seed concepts. New domain knowledge can 
be acquired around some seed concepts that a user 
considers important. In this paper we focus on the 
financial domain, and use: interest rate, stock mar- 
ket, inflation, economic growth, and employment as 
seed concepts. The knowledge we seek to acquire re- 
lates to one or more of these concepts, and consists 
of new concepts not defined in WordNet and new re- 
lations that link these concepts with other concepts, 
some of which are in WordNet. 
For example, from the sentence: When the US 
economy enters a boom, mortgage interest rates rise, 
the system discovers: (1) the new concept mortgage 
interest rate not defined in WordNet but related to 
the seed concept interest rate, and (2) the state of 
the US economy and the value of mortgage interest 
rate are in a DIRECT RELATIONSHIP. 
In WordNet, a concept is represented as a synset 
that contains words sharing the same meaning. In 
our experiments, we extend the seed words to their 
corresponding synset. For example, stock market is 
synonym with stock exchange and securities market, 
and we aim to learn concepts related to all these 
terms, not only to stock market. 
Extract sentences. Queries are formed with each 
seed concept to extract documents from the Internet 
and other possible sources. The documents retrieved 
are further processed such that only the sentences 
that contain the seed concepts are retained. This 
way, an arbitrarily large corpus .4 is formed of sen- 
tences containing the seed concepts. We limit the 
size of this corpus to 1000 sentences per seed con- 
cept. 
Parse sentences. Each sentence in this corpus is 
first part-of-speech (POS) tagged then parsed. We 
use Brill's POS tagger and our own parser. The out- 
put of the POS tagger for the example above is: 
When/WRB the/DW U.~./NNP economy/NN en- 
ters/VBZ a/DT boom/NN ,/, mortgage/NN inter- 
est_rates/NNS rise/vBP ./. 
The syntactic parser output is: 
TOP (S (SBAR (WHADVP (WRB When)) (S (NP (DT 
the) (NNP U.S.) (NN economy)) (VP (VBZ enters) (NP 
(DT a) (NN boom) (, ,))))) (NP (NN mortgage) (NNS 
interest_rates)) (VP (VI3P rise))) 
Extract new concepts. In this paper only noun 
concepts are considered. Since, most likely, one- 
word nouns are already defined in WordNet, the fo- 
cus here is on compound nouns and nouns with mod- 
ifiers that have meaning but are not in WordNet. 
The new concepts directly related to the seeds are 
extracted from the noun phrases (NPs) that contain 
the seeds. In the example above, we see that the 
seed belongs to the NP: mortgage interest rate. 
This way, a list of NPs containing the seeds is 
assembled automatically from the parsed texts. Ev- 
ery such NP is considered a potential new concept. 
This is only the "raw material" from which actual 
concepts are discovered. 
In some noun phrases the seed is the head noun, 
i.e. \[word, word,..see~, where word can be a noun or 
an adjective. For example, \[interest rate\] is in Word- 
Net, but \[short term nominal interest rate\] is not in 
WordNet. Most of the new concepts related to a 
seed are generated this way. In other cases the seed 
is not the head noun i.e. \[word, word,..seed, word, 
wor~. For example \[interest rate peg\], or \[interna- 
tional interest rate differentia~. 
The following procedures are used to discover con- 
cepts, and are applicable in both cases: 
Procedure 1.1. WordNet reduction. Search NP for 
words collocations that are defined in WordNet as 
concepts. Thus \[long term interest rate\] becomes 
\[long_term interest_rate\], \[prime interest rate\] be- 
comes \[prime_interest_rate\], as all hyphenated con- 
cepts are in WordNet. 
Procedure 1.2. Dictionary reduction. For each 
NP, search further in other on-line dictionaries for 
more compound concepts, and if found, hyphen- 
ate the words. Many domain-specific dictionaries 
are available on-line. For example, \[mortgage inter- 
est_rate\] becomes \[mortgage_interest_rate\], since it is 
defined in the on-line dictionary OneLook Dictionar- 
ies (http://www.onelook.com). 
Procedure 1.3. User validation. Since currently we 
lack a formal definition of a concept, it is not possible 
to completely automate the discovery of concepts. 
The human inspects the list of noun phrases and 
decides whether to accept or decline each concept. 
2.2 Discover lexlco-syntactic patterns 
Texts represent a rich source of information from 
which in addition to concepts we can also discover 
relations between concepts. We are interested in dis- 
covering semantic relationships that link the con- 
cepts extracted above with other concepts, some of 
which may be in WordNet. The approach is to 
search for lexico-syntactic patterns comprising the 
concepts of interest. The semantic relations from 
WordNet are the first we search for, as it is only 
natural to add more of these relations to enhance 
the WordNet knowledge base. However, since the 
focus is on the acquisition of domain-specific knowl- 
edge, there are semantic relations between concepts 
other than the WordNet relations that are impor- 
tant. These new relations can be discovered auto- 
matically from the clauses and sentences in which 
the seeds occur. 
269 
Pick a semantic relation R. These can be Word- 
Net semantic relations or any other relations defined 
by the user. So far, we have experimented with the 
WordNet HYPERNYMY (or so-called IS-A) relation, 
and three other relations. By inspecting a few sen- 
tences containing interest rate one can notice that 
INFLUENCE is a frequently used relation. The two 
other relations are CAUSE and EQUIVALENT. 
Pick a pair of concepts Ci, C# among which 
R holds. These may be any noun concepts. In the 
context of finance domain, some examples of con- 
cepts linked by the INFLUENCE relation are: 
interest rate INFLUENCES earnings, or 
credit worthiness INFLUENCES interest rate. 
Extract lexico-syntactic patterns Ci :P Cj. 
Search any corpus B, different from ,4 for all in- 
stances where Ci and Cj occur in the same sentence. 
Extract the lexico-syntactic patterns that link the 
two concepts. For example~ from the sentence : The 
graph indicates the impact on earnings from several 
different interest rate scenarios, the generally appli- 
cable pattern extracted is: 
impact on NP2 from NP1 
This pattern corresponds unambiguously to the re- 
lation R we started with, namely INFLUENCE. Thus 
we conclude: INFLUENCE(NPI, NP2). 
Another example is: As the credit worthiness de- 
creases, the interest rate increases. From this sen- 
tence we extract another lexical pattern that ex- 
presses the INFLUENCE relation: 
\[as NP1 vbl, NP2 vb$\] & \[vbl and vb2 are antonyms\] 
This pattern is rather complex since it contains not 
only the lexical part but also the verb condition that 
needs to be satisfied. 
This procedure repeats for all relations R. 
2.3 Discover new relationships between 
concepts 
Let us denote with Cs the seed-related concepts 
found with Procedures 1.1 through 1.3. We search 
now corpus ,4 for the occurrence of patterns ~ dis- 
covered above such that one of their two concepts is 
a concept Cs. 
Search corpus ,4 for a pattern ~. Using a lexico- 
syntactic pattern P, one at a time, search corpus ,4 
for its occurrence. If found, search further whether 
or not one of the NPs is a seed-related concept Cs. 
Identify new concepts Cn. Part of the pattern 7 ~ 
are two noun phrases, one of which is Cs. The head 
noun from the other noun phrase is a concept Cn we 
are looking for. This may be a WordNet concept, 
and if it is not it will be added to the list of concepts 
discovered. 
Form relation R(Cs, Cn). Since each pattern 7 ~ is 
a linguistic expression of its corresponding seman- 
tic relation R, we conclude R(Cs,Cn) (this is in- 
terpreted "C8 is relation R Cn)'). These steps are 
repeated for all patterns. 
User intervention to accept or reject relationships 
is necessary mainly due to our system inability of 
handling coreference resolution and other complex 
linguistic phenomena. 
2.4 Knowledge classification and 
integration 
Next, a taxonomy needs to be created that is con- 
sistent with WordNet. In addition to creating a 
taxonomy, this step is also useful for validating the 
concepts acquired above. The classification is based 
on the subsumption principle (Schmolze and Lipkis 
1983), (Woods 1991). 
This algorithm provides the overall steps for the 
classification of concepts within the context of Word- 
Net. Figure 1 shows the inputs of the Classification 
Algorithm and suggests that the classification is an 
iterative process. In addition to WordNet, the in- 
puts consist of the corpus ,4, the sets of concepts Cs 
and Cn, and the relationships 7~. Let's denote with 
C = Cs U Cn the union of the seed related concepts 
with the new concepts. All these concepts need to 
be classified. 
Wo,aN=l C°~Tr~ A Co.=i= ~. C°, V.=~tio.~=a~\[ I R \[ I 
t Knowledge Classification ¢--k Algorithm '1 ..... ~i;\] 
Figure 1: The knowledge classification diagram 
Step 1. From the set of relationships 7"~ discovered 
in Part 3, pick all the HYPERNYMY relations. From 
the way these relations were developed, there are 
two possibilities: 
(1) A HYPERNYMY relation links a WordNet concept 
Cw with another concept from the set C denoted 
with CAw , or (2) 
A HYPERNYMY relation links a concept Cs with 
a concept Cn. 
Concepts C~w are immediately linked to Word- 
Net and added to the knowledge base. The concepts 
from case (2) are also added to the knowledge base 
but they form at this point only some isolated islands 
since are not yet linked to the rest of the knowledge 
base. 
Step 2. Search corpus `4 for all the patterns asso- 
ciated with the HYPERNYMY relation that may link 
270 
\[Asian_country interest_rate \] IIS-A TIS-A 
\[Japan discount_rate \] 
a) 
\[country interest_rate \] 
\[Japan discount_rate \] \[Germany prime interest_rate \] 
b) 
Figure 2: Relative classification of two concepts 
concepts in the set Cn with any WordNet concepts. 
Altough concepts C, are not seed-based concepts, 
they are related to at least one Cs concept via a re- 
lationship (as found in Task 3). Here we seek to find 
HYPERNYMY links between them and WordNet con- 
cepts. If such C,~ concepts exist, denote them with 
C~w. The union Chw = C~w LJ C2w represents all 
concepts from the set C that are linked to WordNet 
without any further effort. We focus now on the rest 
of concepts, Cc -- C N Chw, that are not yet linked 
to any WordNet concepts. 
Step 3. Classify all concepts in set Ce using Pro- 
cedures 4.1 through 4.5 below. 
Step 4. Repeat Step 3 for all the concepts in set 
Cc several times till no more changes occur. This 
reclassification is necessary since the insertion of a 
concept into the knowledge base may perturb the 
ordering of other surrounding concepts in the hier- 
archy. 
Step 5. Add the rest of relationships 7~ other 
than the HYPERNYMY to the new knowledge base. 
The HYPERNYMY relations have already been used 
in the Classification Algorithm, but the other rela- 
tions, i.e. INFLUENCE, CAUSE and EQUIVALENT need 
to be added to the knowledge base. 
Concept classification procedures 
Procedure 4.1. Classify a concept of the form \[word, 
head\] with respect to concept \[head\]. 
It is assumed here that the \[head\] concept exists 
in WordNet simply because in many instances the 
"head" is the "seed" concept, and because frequently 
the head is a single word common noun usually de- 
fined in WordNet. In this procedure we consider only 
those head nouns that do not have any hyponyms 
since the other case when the head has other con- 
cepts under it is more complex and is treated by 
Procedure 4.4. Here "word" is a noun or an adjec- 
tive. 
The classification is based on the simple idea 
that a compound concept \[word, head\] is onto- 
logically subsumed by concept \[head\]. For exam- 
ple, mortgage_interest_rate is a kind of interest_rate, 
thus linked by a relation nYPERNYMY(interest_rate, 
mortgage_interest_rate). 
Procedure 4.2. Classify a concept \[wordx, headx\] 
with respect to another concept \[words, head2\]. 
For a relative classification of two such concepts, the 
ontological relations between headz and head2 and 
between word1 and words, if exist, are extended to 
the two concepts. We distinguish here three possi- 
bilities: 
1. heady subsumes heads and word1 subsumes 
word2. In this case \[wordz, headl\] subsumes 
\[word2, heads\]. The subsumption may not al- 
ways be a direct connection; sometimes it may 
consist of a chain of subsumption relations since 
subsumption is (usually) a transitive relation 
(Woods 1991). An example is shown in Fig- 
ure 2a; in WordNet, Asian_country subsumes 
Japan and interest_rate subsumes discount_rate. 
A particular case of this is when head1 is iden- 
tical with head2. 
2. Another case is when there is no direct sub- 
sumption relation in WordNet between word1 
and words, and/or head1 and heads, but there 
are a common subsuming concepts, for each 
pair. When such concepts are found, pick 
the most specific common subsumer (MSCS) 
concepts of word1 and words, and of head1 
and head2, respectively. Then form a concept 
\[MSCS(wordz, words), MSCS(headl, head2)\] 
and place \[word1 headz\] and \[words heads\] un- 
der it. This is exemplified in Figure 2b. In 
WordNet, country Subsumes Japan and Ger- 
many, and interest_rate subsumes discount_rate 
and prime_interest_rate. 
3. In all other cases, no subsumption relation is es- 
tablished between the two concepts. For exam- 
ple, we cannot say whether Asian_country dis- 
count_rate is more or less abstract then Japan 
interest_rate. 
Procedure 4.3. Classify concept \[word1 words head\]. 
Several poss!bilities exist: 1. When there is already a concept \[words head\] 
in the knowledge base under the \[head\], then 
place \[wordl words head\] under concept \[words 
head\]. 
2. When there is already a concept \[wordz head\] 
in the knowledge base under the \[head\], then 
place \[wordl word2 head\] under concept \[wordl 
head\]. 
3. When both cases 1 and 2 are true then place 
\[wordz word2 head\] under both concepts. 
271 
4. When neither \[wordl head\] nor \[words head\] are 
in the knowledge base, then place \[wordl word~ 
head\] under the \[head\]. The example in Figure 
3 corresponds to case 3. 
components ;y/ 
radio components automobile components / 
automobile radio components 
Figure 3: Classification of a compound concept with respect to 
its ~ concepts 
Since we do not deal here with the sentence seman- 
tics, it is not possible to completely determine the 
meaning of \[word1 word2 head\], as it may be either 
\[((word1 word2) head)\] or \[(word1 (words head))\] of- 
ten depending on the sentence context. 
In the example of Figure 3 there is only one mean- 
ing, i.e. \[(automobile radio) components\]. However, 
in the case of ~erformance skiing equipment\] there 
are two valid interpretations, namely \[(performance 
skiing) equipment\] and ~erformance (skiing equip- 
ment)\]. 
Procedure 4.4 Classify a concept \[word1, head\] with 
respect to a concept h/erarchy under the ~aead\]. 
The task here is to identify the most specific sub- 
sumer (MSS) from all the concepts under the head 
that subsumes \[wordx, head\]. By default, \[wordl 
head\] is placed under \[head\], however, since it may 
be more specific than other hyponyms of \[head\], a 
more complex classification analysis needs to be im- 
plemented. 
In the previous work on knowledge classification 
it was assumed that the concepts were accompanied 
by rolesets and values (Schmolze and Lipkis 1983), 
(Woods 1991), and others. Knowledge classifiers are 
part of almost any knowledge representation system. 
However, the problem we face here is more diffi- 
cult. While in build-by-hand knowledge representa- 
tion systems, the relations and values defining con- 
cepts are readily available, here we have to extract 
them from text. Fortunately, one can take advantage 
of the glossary definitions that are associated with 
concepts in WordNet and other dictionaries. One 
approach is to identify a set of semantic relations 
into which the verbs used in the gloss definitions are 
mapped into for the purpose of working with a man- 
ageable set of relations that may describe the con- 
cepts restrictions. In WordNet these basic relations 
are already identified and it is easy to map every 
verb into such a semantic relation. 
As far as the newly discovered concepts are con- 
cerned, their defining relations need to be retrieved 
from texts. Human assistance is required, at least 
for now, to pinpoint the most characteristic relations 
that define a concept. 
Below is a two step algorithm that we envision for 
the relative classification of two concepts A and B. 
Let's us denote with ARaCa and BRbCb the rela- 
tionships that define concepts A and B respectively. 
These are similar to rolesets and values. 
1. Extract relations (denoted by verbs) be- 
tween concept and other gloss concepts. 
ARalC~I BRblCbl 
ARa2Ca2 BRb2Cb2 
AR,~Cam B Rbn Cb,, 
2. A subsumes B ff and only if 
(a) Relations Rai subsume Rbl, for 1 < i < m. 
(b) Col subsumes or is a meronym of Cbi. 
(c) Concept B has more relations than concept 
A, i.e. m<n. 
Example: In Figure 4 it is shown the classification 
of concept monetary policy that has been discovered. 
By default this concept is placed under policy. How- 
ever in WordNet there is a hierarchy fiscal policy - 
IS-A - economic policy - IS-A - policy. The question 
is where exactly to place monetary policy in this hi- 
erarchy. 
The gloss of economic policy indicates that it is 
MADE BY Government, and that it CONTROLS eco- 
nomic growth- (here we simplified the explanation 
and used economy instead of economic growth). The 
gloss of fiscal policy leads to relations MADE BY Gov- 
ernment, CONTROLS budget, and CONTROLS taxa- 
tion. The concept money supply was found by Pro- 
cedure 1.2 in several dictionaries, and its dictionary 
definition leads to relations MADE BY Federal Gov- 
ernment, and CONTROLS money supply. In Word- 
Net Government subsumes Federal Government, and 
economy HAS PART money. All necessary conditions 
are satisfied for economic policy to subsume mone- 
tary policy. However, fiscal policy does not subsume 
monetary policy since monetary policy does not con- 
trol budget or taxation, or any of their hyponyms. 
Procedure 4.5 Merge a structure of concepts with 
the rest of the knowledge base. 
It is possible that structures consisting of several 
inter-connected concepts are formed in isolation of 
the main knowledge base as a result of some proce- 
dures. The task here is to merge such structures with 
the main knowledge base such that the new knowl- 
edge base will be consistent with both the struc- 
ture and the main knowledge base. This is done by 
272 
po~cy 
| IS-A 
economic policy .~ ........... >" made by ........ :" government 
monetary policy = - - -> madeby 
fiscal policy : ~ .......... > made by ........ >- government 
"'. "~" controls ........ ~" budget 
"-k controls ........ :" taxation 
WordNet 
before merger 
work place 
t lS-A 
exchange 
I IS-A 
stock market 
industry 
t lS-A 
market 
IS-A 
money market 
Figure 4: Classification of the new concept monetary policy 
WordNet 
The new structure from text after merger 
 Taltarke' 
eapital~market money market 
\[IS-A 
stock market 
work place 
t lS-A 
exchange 
A~ capital market 
IS- 
stock market 
ind~su'y 
\[ IS-A 
ma\[ket 
IS-A 
financial market 
money market 
Figure 5: Merging a structure of concepts with WordNet 
bridging whenever possible the structure concepts 
and the main knowledge base concepts. It is possi- 
ble that as a result of this merging procedure, some 
HYPERNYMY relations either from the structure or 
the main knowledge base will be destroyed to keep 
the consistency. An example is shown in Figure 5. 
Example : The following HYPERNYMY relation- 
ships were discovered in Part 3: 
HYPERNYMY(financial market,capital market) 
HYPERNYMY(fInancial market,money market) 
HYPERNYMY(capital market,stock market) 
The structure obtained from these relationships 
along with a part of WordNet hierarchy is shown 
in Figure 5. An attempt is made to merge the new 
structure with WordNet. To these relations it cor- 
responds a structure as shown in Figure 5. An at- 
tempt is made to merge this structure with Word- 
Net. Searching WordNet for all concepts in the 
structure we find money market and stock market 
in WordNet where as capital market and financial 
market are not. Figure 5 shows how the structure 
merges with WordNet and moreover how concepts 
that were unrelated in WordNet (i.e. stock market 
and money market) become connected through the 
new structure. It is also interesting to notice that the 
IS-A link in WordNet from money market to market 
is interrupted by the insertion of financial market 
in-between them. 
3 Implementation and Results 
The KAT Algorithm has been implemented, and 
when given some seed concepts, it produces new con- 
cepts, patterns and relationships between concepts 
in an interactive mode. Table 1 shows the number 
of concepts extracted from a 5000 sentence corpus, 
in which each sentence contains at least one of the 
five seed concepts. 
The NPs were automatically searched in Word- 
Net and other on-line dictionaries. There were 3745 
distinct noun phrases of interest extracted; the rest 
contained only the seeds or repetitions. Most of the 
273 
\[I Relations I Lexico-syntactic Patterns Examples 
H WordNet Relations 
HYPERNYMY I NP1 \[<be>\] a kind of NP2 Thus, LIBOR is a kind of interest rate, as it is charged 
I ::~ HYPERNYMY(NPI,NP2) on deposits between banks in the Eurodolar market. 
New Relations 
CAUSE 
INFLUENCE 
NPI \[<be>\] cause NP2 
=~ CAUSE(NPI,NP2) 
NP1 impact on NP2 
INFLU~NCZ(NP1,NP2) 
As NP1 vb, so <do> NP2 
=> INFLUENCE(NPI,NP2) 
NP1 <be> associated with NP2 
=> INFLUENCE(NP1,NP2) 
INFLUENCE(NP2,NPI) 
As/if/when NP1 vbl, NP2 vb2. -{- 
vbl, vb2 ---- antonyms / go in 
opposite directions 
::~ INFLUENCE(NPI,NP2) 
the effect(s) of NP1 on/upon NP2 
::> INFLUENCE(NPI,NP2) 
inverse relationship between 
NPI and NP2 
=> INFLUENCE(NP1,NP2) 
=~ INFLUENCE(NP2,NP1) 
NP2 <be> function of NP1 
=# INFLUENCZ(NP1,NP2) 
NP1 (and thus NP2) 
:~ INFLUENCE(NPI,NP2) 
Phillips, a British economist, stated in 1958 that high inflation 
causes low unemployment rates. 
The Bank of Israel governor said that the ti;ht economic policy 
would have an immediate impact on inflation this year. 
As the economy picks up steam, so does inflation. 
Higher interest rates are normally associated with weaker bond markets. 
On the other hand, if interest rates go down, bonds go up, 
and your bond becomes more valuable. 
The effects of inflation on debtors and creditors varies as the 
actual inflation is compared to the expected one. 
There exists an inverse relationship between unemployment rates 
and inflation, best illustrated by the Phillips Curve. 
Irish employment is also largely a function of the past 
high birth rate. 
We believe that the Treasury bonds (and thus interest rates) 
are in a downward cycle. 
Table 2: Examples of lexico-syntactic patterns and semantic relations derived from the 5000 sentence corpus 
la I bl c Id I e II 
concepts (NPs) 773 382 833 921 . 
Total concepts extracted with Procedurel 
Concepts found 
in WordNet 2 0 1 0 2 
Concepts Concepts 
found in with seed 6 0 3 0 0 
on-line head 
dictionaries, Concepts 
but not in with seed 7 0 1 1 1 
WordNet not head I C°ncepts accepted 
I 
by human 78 62 58 60 37 
Table 1: Results showing the number of new concepts learned 
from the corpus related to (a) interest rate, (b) stock market, (c) 
inflation, (d) economic 9rowth, a~ld (e) employment. 
processing in Part 1 is taken by the parser. The hu- 
man intervention to accept or decline concepts takes 
about 4 min./seed. 
The next step was to search for lexico-syntactic 
patterns. We considered one WordNet semantic re- 
lation, HYPERNYMY and three other relations that 
we found relevant for the domain, namely INFLU- 
ENCE, CAUSE and EQUIVALENT. For each relation, 
a pair of related words was selected and searched 
for on the Internet. The first 500 sentences/relation 
were retained. A human selected and validated semi- 
automatically the patterns for each sentence. A sam- 
ple of the results is shown in Table 2. A total of 22 
patterns were obtained and their selection and vali- 
dation took approximately 35 minutes/relation. 
Next, the patterns are searched for on the 5000 
sentence corpus (Part 3). The procedure provided 
a total of 43 new concepts and 166 relationships 
in which at least one of the seeds occurred. From 
these relationships, by inspection, we have accepted 
63 and rejected 102, procedure which took about 7 
minutes. Table 3 lists some of the 63 relationships 
discovered. 
Relationships. Examples 
HYPEaNYMY(interest rate, LIBOR) 
HYPERNYMY(leading stock market, 
New York Stock Exchange) 
HYPERNYMY(market risks, interest rate risk) 
HYPERNYMY(Capital markets, stock markets) 
CAUSE(inflation, unemployment) 
CAUSE(labour shortage, wage inflation) 
CAUSE(excessive demand, inflation 
INFLUENCE_DIRECT_PROPORTIONALYI economy, inflation) 
INFLUENCE_DIRECT_PROPORT1ONALY settlements, interest rate) 
INFLUENCE..DIRECT..PROPORTIONALY~ U.S. interest rates, dollars) 
INFLUENCE_DIRECT_PROPORTIONALY~ oil prices, inflation) 
INFLUENCE_DIRECT_PROPORTIONALY' inflation, nominal interest rates) 
INFLUENCE..DIRECT_PROPORTIONALY~ deflation, real interest rates) 
INFLUENCE-DIRECT-PROPORTIONALY currencles,lnflation) 
INFLUENCE_INVERSE_PROPORTIONALY unemployment rates, inflation) 
INFLUENCE_INVERSE-PKOPOKTIONALY monetary policies, inflation) 
INFLUENCE_INVERSE_PROPORTIONALY economy, interest rates) 
INFLUENCE_INVERSE..PROPORTIONALY inflation, unemployment rates) 
INFLUENCE.JNVERSE-PROPORTIONALY credit worthiness, interest rate) 
INFLUENCE_INVERSE-PROPORTIONALYlinterest rates, bonds) 
INFLUENCE(Internal Revenue Service, interest rates) 
INFLUENCE(economic growth, share prices) 
EQUIVALENT(big mistakes, high inflation rates of 1970s) 
EQUIVALENT(fixed interest rate, coupon) 
Table 3: A part of the relationships derived from the 5000 
sentence corpus 
274 
4 Applications 
An application in need of domain-specific knowledge 
is Question Answering. The concepts and the rela- 
tionships acquired can be useful in answering dif- 
ficult questions that normally cannot be easily an- 
swered just by using the information from WordNet. 
Consider the processing of the following questions af- 
ter the new domain knowledge has been acquired: 
QI: What factors have an impact on the interest 
rate? 
Q2: What happens with the employment when the 
economic growth rises? 
Q3: How does deflation influence prices? 
"'"'"-... J 
ISA " ~ 
Figure 6: A sample of concepts and relations acquired from the 
5000 sentence corpus. Legend: continue lines represent influence 
inverse proportionally, dashed lines represent influence direct 
proportionally, and dotted lines represent influence (the direction of the relationship was not specified in the text). 
Figure 6 shows a portion of the new domain 
knowledge that is relevant to these questions. The 
first question can be easily answered by extracting 
the relationships that point to the concept interest 
rate. The factors that influence the interest rate are 
Fed, inflation, economic growth, and employment. 
The last two questions ask for more detailed infor- 
mation about the complex relationship among these 
concepts. Following the path from the deflation con- 
cept up to prices, the system learns that deflation in- 
fluences direct proportionally real interest rate, and 
real interest rate has an inverse proportional impact 
on prices. Both these relationships came from the 
sentence: Thus, the deflation and the real interest 
rate are positively correlated, and so a higher real 
interest rate leads to falling prices. 
This method may be adapted to acquire infor- 
mation when the question concepts are not in the 
knowledge base. Procedures may be invoked to dis- 
cover these concepts and the relations in which they 
may be used. 
5 Conclusions 
The knowledge acquisition technology described 
above is applicable to any domain, by simply select- 
ing appropriate seed concepts. We started with five 
concepts interest rate, stock market, inflation, eco- 
nomic growth, and employment and from a corpus 
of 5000 sentences we acquired a total of 362 con- 
cepts of which 319 contain the seeds and 43 relate 
to these via selected relations. There were 22 dis- 
tinct le:dco-syntactic patterns discovered used in 63 
instances. Most importantly, the new concepts can 
be integrated with an existing ontology. 
The method works in an interactive mode where 
the user accepts or declines concepts, patterns and 
relationships. The manual operation took on aver- 
age 40 minutes per seed for the 5000 sentence corpus. 
KAT is useful considering that most of the knowl- 
edge base construction today is done manually. 
Complex linguistic phenomena such as corefer- 
ence resolution, word sense disambiguation, and oth- 
ers have to be dealt with in order to increase the 
automation of the knowledge acquisition system. 
Without a good handling of these problems the re- 
sults are not always accurate and human interven- 
tion is necessary. 

References 

Christiane Fellbaum. WordNet - An Electronic Lexical Database, MIT Press, Cambridge, MA, 1998. 

Marti Hearst. Automated Discovery of WordNet Relations. In WordNet: An Electronic Lezical Database and Some of its Applications, editor Fellbaum, C., MIT Press, Cambridge, MA, 1998. 

J. Kim and D. Moldovan. Acquisition of Linguistic Patterns for knowledge-based information extraction. IEEE Transactions on Knowledge and Data Engineering 7(5): pages 713-724. 

R. MacGregor. A Description Classifier for the Predicate Calculus. Proceedings of the 12th National Conference on Artificial Intelligence (AAAI94), pp. 213-220, 1994. 

Stephen D. Richardson, William B. Dolan, Lucy Vanderwende. MindNet: acquiring and structuring semantic information from text. Proceedings of ACL-Coling 1998, pages 1098-1102. 

Ellen Riloff. Automatically Generating Extraction Patterns from Untagged Text. In Proceedings of the Thirteenth National Conference on Artificial Intelligence, 1044-1049. The AAAI Press/MIT Press. 

J.G. Schmolze and T. Lipkis. Classification in the KL-ONE knowledge representation system. Proceedings of 8th Int'l Joint Conference on Artificial Intelligence (IJCAI83), 1983. 

S. Soderland. Learning to extract text-based information from the world wide web. In the Proceedings of the Third International Conference on Knowledge Discover# and Data Mining (KDD-97). 

Text REtrieval Conference. http://trec.nist.gov 1999 

W.A. Woods. Understanding Subsumption and Taxonomy: A Framework for Progress. In the Principles of Semantic Networks: Explorations in the Representation of Knowledge, Morgan Kaufmann, San Mateo, Calif. 1991, pages 45-94. 

W.A. Woods. A Better way to Organize Knowledge. Technical Report of Sun Microsystems Inc., 1997. 
