AN EMPIRICAL STUDY ON THEMATIC KNOWLEDGE ACQUISITION 
BASED ON SYNTACTIC CLUES AND HEURISTICS 
Rey-Long Liu* and Von-Wun Soo** 
Department of Computer Science 
National Tsing-Hua University 
HsinChu, Taiwan, R.O.C. 
Email: dr798303@cs.nthu.edu.tw* and soo@cs.nthu.edu.tw** 
Abstract 
Thematic knowledge is a basis of semamic interpreta- 
tion. In this paper, we propose an acquisition method 
to acquire thematic knowledge by exploiting syntactic 
clues from training sentences. The syntactic clues, 
which may be easily collected by most existing syn- 
tactic processors, reduce the hypothesis space of the 
thematic roles. The ambiguities may be further 
resolved by the evidences either from a trainer or 
from a large corpus. A set of heurist-cs based on 
linguistic constraints is employed to guide the ambi- 
guity resolution process. When a train,-.r is available, 
the system generates new sentences wtose thematic 
validities can be justified by the trainer. When a large 
corpus is available, the thematic validity may be justi- 
fied by observing the sentences in the corpus. Using 
this way, a syntactic processor may become a 
thematic recognizer by simply derivir.g its thematic 
knowledge from its own syntactic knowledge. 
Keywords: Thematic Knowledge Acquisition, Syntac- 
tic Clues, Heuristics-guided Ambigu-ty Resolution, 
Corpus-based Acquisition, Interactive Acquisition 
1. INTRODUCTION 
Natural language processing (NLP) systems need 
various knowledge including syntactic, semantic, 
discourse, and pragmatic knowledge in different 
applications. Perhaps due to the relatively well- 
established syntactic theories and forrc.alisms, there 
were many syntactic processing systew, s either manu- 
ally constructed or automatically extenJ~d by various 
acquisition methods (Asker92, Berwick85, Brentgl, 
Liu92b, Lytinen90, Samuelsson91, Simmons91 Sanfi- 
lippo92, Smadja91 and Sekine92). However, the satis- 
factory representation and acquisition methods of 
domain-independent semantic, disco~lrse, and prag- 
matic knowledge are not yet develo~d or computa- 
tionally implemented. NLP systems 6f'.en suffer the 
dilemma of semantic representation. Sophisticated 
representation of semantics has better expressive 
power but imposes difficulties on acquF;ition in prac- 
tice. On the other hand, the poor adequacy of naive 
semantic representation may deteriorate the perfor- 
mance of NLP systems. Therefore, for plausible 
acquisition and processing, domain-dependent seman- 
tic bias was 9ften employed in many previous acquisi- 
tion systez, s (Grishman92b, Lang88, Lu89, and 
Velardi91). 
In thi~ paper, we present an implemented sys- 
tem that acquires domain-independent thematic 
knowledge using available syntactic resources (e.g. 
syntactic p~acessing systems and syntactically pro- 
cessed cort;ara). Thematic knowledge can represent 
semantic or conceptual entities. For correct and effi- 
cient parsing, thematic expectation serves as a basis 
for conflict resolution (Taraban88). For natural 
language understanding and other applications (e.g. 
machine translation), thematic role recognition is a 
major step. ~ematic relations may serve as the voca- 
bulary shared by the parser, the discourse model, and 
the world knowledge (Tanenhaus89). More impor- 
tantly, since thematic structures are perhaps most 
closely link~d to syntactic structures ($ackendoff72), 
thematic knowledge acquisition may be more feasible 
when only .:'yntactic resources are available. The con- 
sideration of the availability of the resources from 
which thematic knowledge may be derived promotes 
the practica2 feasibility of the acquisition method. 
In geaeral, lexical knowledge of a lexical head 
should (at ~east) include 1) the number of arguments 
of the lexic~-~l head, 2) syntactic properties of the argu- 
ments, and 3) thematic roles of the arguments (the 
argument ,:~ructure). The former two components 
may be eitt~er already constructed in available syntac- 
tic processors or acquired by many syntactic acquisi- 
tion system s . However, the acquisition of the thematic 
roles of th~ arguments deserves more exploration. A 
constituent~ay have different thematic roles for dif- 
ferent verbs in different uses. For example, "John" has 
different th,~matic roles in (1.1) - (1.4). 
(1.1) \[Agenz John\] turned on the light. 
(1.2) \[Goal rohn\] inherited a million dollars. 
(1.3) The magic wand turned \[Theme John\] into a 
frog. 
243 
Table 1. Syntactic clues for hypothesizing thematic roles 
Theta role 
Agent(Ag) 
Goal(Go) 
Source(So) 
Instrument(In) 
Theme(Th) 
Beneficiary(Be) 
Location(Lo) 
Time(Ti) 
Quantity(Qu) 
Proposition(Po) 
Manner(Ma) 
Cause(Ca) 
Result(Re) 
Constituent 
NP 
NP 
NP 
NP 
NP 
NP 
NP,ADJP 
NP(Ti) 
NP(Qu) 
Proposition 
ADVP,PP 
NP 
NP 
Animate Subject 
Y 
y(animate) 
y(animate) 
y(no Ag) 
Y 
n 
Y 
Y 
Object 
n 
n 
n 
Y 
Preposition in PP 
by 
till,untill,to,into,down 
from 
with,by 
of, about 
for 
at,in,on,under 
at,in,before,after,about,by,on,during 
for 
none 
in,with 
by,for,because of 
in ,into 
(1.4) The letter reached \[Goal John\] yesterday. 
To acquire thematic lexical knowledge, precise 
thematic roles of arguments in the sentences needs to 
be determined. 
In the next section, the thematic roles con- 
sidered in this paper are listed. The syntactic proper- 
ties of the thematic roles are also summarized. The 
syntactic properties serve as a preliminary filter to 
reduce the hypothesis space of possible thematic roles 
of arguments in training sentences. To further resolve 
the ambiguities, heuristics based on various linguistic 
phenomena and constraints are introduced in section 
3. The heuristics serve as a general guidance for the 
system to collect valuable information to discriminate 
thematic roles. Current status of the experiment is 
reported in section 4. In section 5, the method is 
evaluated and related to previous methodologies. We 
conclude, in section 6, that by properly collecting 
discrimination information from available sources, 
thematic knowledge acquisition may be, more feasible 
in practice. 
2. THEMATIC ROLES AND SYNTAC- 
TIC CLUES 
The thematic roles considered in this paper and the 
syntactic clues for identifying them are presented in 
Table 1. The syntactic clues include i) the possible 
syntactic constituents of the arguments, 2) whether 
animate or inanimate arguments, 3) grammatical 
functions (subject or object) of the a;guments when 
they are Noun Phrases (NPs), and 4) p:epositions of 
the prepositional phrase in which the aaguments may 
occur, The syntactic constituents inc!t:de NP, Propo- 
sition (Po), Adverbial Phrase (ADVP), Adjective 
Phrase (ADJP), and Prepositional phrase (PP). In 
addition to common animate nouns (e.g. he, she, and 
I), proper nguns are treated as animate NPs as well. 
In Table 1, "y", "n", "?", and "-" denote "yes", "no", 
"don't care", and "seldom" respectively. For example, 
an Agent should be an animate NP which may be at 
the subject (but not object) position, and if it is in a 
PP, the preposition of the PP should be "by" (e.g. 
"John" in "the light is turned on by John"). 
We consider the thematic roles to be well- 
known and referred, although slight differences might 
be found in various works. The intrinsic properties of 
the thematic roles had been discussed from various 
perspectivez in previous literatures (Jackendoff72 and 
Gruber76). Grimshaw88 and Levin86 discussed the 
problems o_ ~ thematic role marking in so-called light 
verbs and aJjectival passives. More detailed descrip- 
tion of the thematic roles may be found in the litera- 
tures. To illustrate the thematic roles, consider (2.1)- 
(2.9). 
(2.1) lag The robber\] robbed \[So the bank\] of \[Th the 
money\]. 
(2.2) \[Th The rock\] rolled down \[Go the hill\]. 
(2.3) \[In Tt,e key\] can open \[Th the door\]. 
(2.4) \[Go Will\] inherited \[Qua million dollars\]. 
(2.5) \[Th ~!e letter\] finally reached \[Go John\]. 
(2.6) \[Lo "121e restaurant\] can dine \[Th fifty people\]. 
(2.7) \[Ca A fire\] burned down \[Th the house\]. 
(2.8) lAg John\] bought \[Be Mary\] \[Th a coat\] \[Ma 
reluctantly\]. 
(2.9) lag John\] promised \[Go Mary\] \[Po to marry 
her\]. - 
When a tr, lining sentence is entered, arguments of 
lexical verbs in the sentence need to be extracted 
before leart ing. This can be achieved by invoking a 
syntactic processor. 
244 
Table 2. Heuristics for discriminating ther atic roles 
• Volition Heuristic (VH): Purposive constructions (e.g. in order to) an0 purposive adverbials (e.g. deliberately and 
intentionally) may occur in sentences with Agent arguments (Gruber76). 
• Imperative Heuristic OH): Imperatives are permissible only for Agent subjects (Gruber76). 
• Thematic Hierarchy Heuristic (THH): Given a thematic hierarchy (from higher to lower) "Agent > Location, 
Source, Goal > Theme", the passive by-phrases must reside at a higher level than the derived subjects in the hierar- 
chy (i.e. the Thematic Hierarchy Condition in Jackendoff72). In this papzr, we set up the hierarchy: Agent > Loca- 
tion, Source, Goal, Instrument, Cause > Theme, Beneficiary, Time, Quantity, Proposition, Manner, Result. Subjects 
and objects cannot reside at the same level. 
• Preposition Heuristic (PH): The prepositions of the PPs in which the arguments occur often convey good 
discrimination information for resolving thematic roles ambiguities (see the "Preposition in PP" column in Table 1). 
• One-Theme Heuristic (OTH): An ~xgument is preferred to be Theme if itis the only possible Theme in the argu- 
ment structure. 
• Uniqueness Heuristic (UH): No twc, arguments may receive the sanle thematic role (exclusive of conjunctions 
and anaphora which co-relate two constituents assigned with the same thematic role). 
If the sentence is selected from a syntactically pro- 
cessed corpus (such as the PENN treebank) the argu- 
ments may be directly extracted from the corpus. To 
identify the thematic roles of the arguments, Table 1 
is consulted. 
For example, consider (2.1) as the training sen- 
tence. Since "the robber" is an animate NP with the 
subject grammatical function, it can only qualify for 
Ag, Go, So, and Th. Similarly, since "the bank" is an 
inanimate NP with the object grammatical function, it 
can only satisfy the requirements of Go, So, Th, and 
Re. Because of the preposition "of", "th~ money" can 
only be Th. As a result, after con,;ulting the con- 
straints in Table 1, "the robber", "the bank", and "the 
money" can only be {Ag, Go, So, Tb}, {Go, So, Th, 
Re}, and {Th} respectively. Therefore, although the 
clues in Table 1 may serve as a filter, lots of thematic 
role ambiguities still call for other discrimination 
information and resolution mechanisms. 
3. FINDING EXTRA INFORMATION 
FOR RESOLVING THETA ROLE 
AMBIGUITIES 
The remaining thematic role ambiguities should be 
resolved by the evidences from other sources. 
Trainers and corpora are the two most commonly 
available sources of the extra information. Interactive 
acquisition had been applied in various systems in 
which the oracle from the trainer may reduce most 
ambiguities (e.g. Lang88, Liu93, Lu89, and 
Velardi91). Corpus-based acquisition systems may 
also converge to a satisfactory performance by col- 
lecting evidences from a large corpus (e.g. Brent91, 
Sekine92, Smadja91, and Zernik89). We are con- 
cerned with the kinds of information the available 
sources may contribute to thematic knowledge 
acquisition. 
The heuristics to discriminate thematic roles are 
proposed in Table 2. The heuristics suggest the sys- 
tem the ways of collecting useful information for 
resolving ambiguities. Volition Heuristic and Impera- 
tive Heuriz'jc are for confirming the Agent role, 
One-Theme Heuristic is for Theme, while Thematic 
Hierarchy Heuristic, Preposition Heuristic and 
Uniqueness Heuristic may be used in a general way. 
It sh~ald be noted that, for the purposes of effi- 
cient acquisition, not all of the heuristics were identi- 
cal to the corresponding original linguistic postula- 
tions. For example, Thematic Hierarchy Heuristic was 
motivated by the Thematic Hierarchy Condition 
(Jackendoff72) but embedded with more constraints 
to filter ou~ more hypotheses. One-Theme Heuristic 
was a relaxed version of the statement "every sen- 
tence has a theme" which might be too strong in many 
cases (Jack. mdoff87). 
Becaase of the space limit, we only use an 
example tc illustrate the idea. Consider (2.1) "The 
robber rob'~ed the bank of the money" again. As 
245 
mentioned above, after applying the preliminary syn- 
tactic clues, "the robber", "the bank", and "the 
money" may be {Ag, Go, So, Th}, {Ge, So, Th, Re}, 
and {Th} respectively. By applying Uniqueness 
Heuristic to the Theme role, the argument structure of 
"rob" in the sentence can only be 
(AS1) "{Ag, Go, So}, {Go, So, Re}, {Th}", 
which means that, the external argument is {Ag, Go, 
So} and the internal arguments are {Go, So, Re} and 
{Th}. Based on the intermediate result, Volition 
Heuristic, Imperative Heuristic, Thematic Hierarchy 
Heuristic, and Preposition Heuristic could be invoked 
to further resolve ambiguities. 
Volition Heuristic and Imperative Heuristic ask 
the learner to verify the validities of:the sentences 
such as "John intentionally robbed the bank" ("John" 
and "the robber" matches because they have the same 
properties considered in Table 1 and Table 2). If the 
sentence is "accepted", an Agent is needed for "rob". 
Therefore, the argument structure becomes 
(AS2) "{Ag}, {Go, So, Re}, {Th}" 
Thematic Hierarchy Heuristic guides the 
learner to test the validity of the passive Form of (2.1). 
Similarly, since sentences like "The barb: is robbed by 
Mary" could be valid, "The robber" is higher than 
"the bank" in the Thematic Hierarchy. Therefore, the 
learner may conclude that either AS3 or AS4 may be 
the argument structure of "rob": 
(AS3) "{Ag}, {Go, So, Re}, {Th}" 
(AS4) "{Go, So}, {Re}, {Th}". 
Preposition Heuristic suggests the learner to to 
resolve ambiguities based on the prel:ositions of PPs. 
For example, it may suggest the sys~.em to confirm: 
The money is from the bank? If sc, "the bank" is 
recognized as Source. The argument structure 
becomes 
(AS5) "{Ag, Go}, {So}, {Th}". 
Combining (AS5) with (AS3) or (ASS) with (AS2), 
the learner may conclude that the arg~rnent structure 
of"rob" is "{Ag}, {So}, {Th}". 
In summary, as the arguments of lexical heads 
are entered to the acquisition system, the clues in 
Table 1 are consulted first to reduce tiae hypothesis 
space. The heuristics in Table 2 are then invoked to 
further resolve the ambiguities by coliecting useful 
information from other sources. The information that 
the heuristics suggest the system to collect is the 
thematic validities of the sentences that may help to 
confirm the target thematic roles. 
The confirmation information required by Voli- 
tion Heuristic, Imperative Heuristic. and Thematic 
Hierarchy Heuristic may come from corpora (and of 
course trainers as well), while Preposition Heuristic 
sometimes r, eeds the information only available from 
trainers. This is because the derivation of new PPs 
might generate ungrammatical sentences not available 
in general .:orpora. For example, (3.1) from (2.3) 
"The key can open the door" is grammatical, while 
(3.2) from (2.5) "The letter finally reached John" is 
ungrammatical. 
(3.1) The door is opened by the key. 
(3.2) *The letter finally reached to John. 
Therefore, simple queries as above are preferred in 
the method. 
It should also be noted that since these heuris- 
tics only serve as the guidelines for finding discrimi- 
nation information, the sequence of their applications 
does not have significant effects on the result of 
learning. However, the number of queries may be 
minimized by applying the heuristics in the order: 
Volition Heuristic and Imperative Heuristic -> 
Thematic Hierarchy Heuristic -> Preposition Heuris- 
tic. One-Th',~me Heuristic and Uniqueness Heuristic 
are invoked each time current hypotheses of thematic 
roles are changed by the application of the clues, Vol- 
ition Heuristic, Imperative Heuristic, Thematic 
Hierarchy Heuristic, or Preposition Heuristic. This is 
because One-Theme Heuristic and Uniqueness 
Heuristic az'e constraint-based. Given a hypothesis of 
thematic r~.es, they may be employed to filter out 
impossible combinations of thematic roles without 
using any qaeries. Therefore, as a query is issued by 
other heuristics and answered by the trainer or the 
corpus, the two heuristics may be used to "extend" the 
result by ft~lher reducing the hypothesis space. 
4. EXPERIMENT 
As described above, the proposed acquisition method 
requires syntactic information of arguments as input 
(recall Table 1). We believe that the syntactic infor- 
mation is one of the most commonly available 
resources, it may be collected from a syntactic pro- 
cessor or a ;yntactically processed corpus. To test the 
method wita a public corpus as in Grishman92a, the 
PENN Tre~Bank was used as a syntactically pro- 
cessed co~pus for learning. Argument packets 
(including VP packets and NP packets) were 
extracted .tom ATIS corpus (including JUN90, 
SRI_TB, and TI_TB tree files), MARI corpus (includ- 
ing AMBIC~ and WBUR tree files), MUC1 corpus, 
and MUC2 corpus of the treebank. VP packets and 
NP packets recorded syntactic properties of the argu- 
ments of verbs and nouns respectively. 
246 
Corpus Sentences 
ATIS 1373 
MARI 543 
MUC1 1026 
MUC2 3341 
Table 3. Argument extraction from TreeBank 
{Nords 
15286 
9897 
22662 
73548 
VP packe~ Verbs NPpacke~ Nouns 
1716 138 959 188 
1067 509 425 288 
1916 732 907 490 
6410 1556 3313 1177 
Since not all constructions involving movement 
were tagged with trace information in the corpus, to 
derive the arguments, the procedure needs to consider 
the constructions of passivization, interjection, and 
unbounded dependency (e.g. in relative clauses and 
wh-questions). That is, it needs to determine whether 
a constituent is an argument of a verb (or noun), 
whether an argument is moved, and if so, which con- 
stituent is the moved argument. Basically, Case 
Theory, Theta Theory (Chomsky81), and Foot 
Feature Principle (Gazdar85) were employed to locate 
the arguments (Liu92a, Liu92b). 
Table 3 summarizes the results of the argument 
extraction. About 96% of the trees were extracted. 
Parse trees with too many words (60) or nodes (i.e. 50 
subgoals of parsing) were discarded. ~2~1 VP packets 
in the parse trees were derived, but only the NP pack- 
ets having PPs as modifiers were extracted. These PPs 
could help the system to hypothesize axgument struc- 
tures of nouns. The extracted packets were assimi- 
lated into an acquisition system (called EBNLA, 
Liu92a) as syntactic subcategorization frames. Dif- 
ferent morphologies of lexicons were not counted as 
different verbs and nouns. 
As an example of the extracted argument pack- 
ets, consider the following sentence from MUCI: 
"..., at la linea ..... where a FARC front ambushed an 
1 lth brigade army patrol". 
The extraction procedure derived the following VP 
packet for "ambushed": 
ambushed (NP: a FARC fxont) (WHADVP: where) 
(NP: an 1 lth brigade army patrol) 
The first NP was the external argument of the verb. 
Other constituents were internal arga:nents of the 
verb. The procedure could not determ,r.e whether an 
argument was optional or not. 
In the corpora, most packets were for a small 
number of verbs (e.g. 296 packets tot "show" were 
found in ATIS). Only 1 to 2 packets could be found 
for most verbs. Therefore, although tt.e parse trees 
could provide good quality of argument packets, the 
information was too sparse to resoNe, thematic role 
ambiguities. This is a weakness embedded in most 
corpus-based acquisition methods, since the learner 
might finally fail to collect sufficient information after 
spending much. effort to process the corpus. In that 
case, the ~ambiguities need to be temporarily 
suspended. ~To seed-up learning and focus on the 
usage of the proposed method, a trainer was asked to 
check the thematic validities (yes/no) of the sentences 
generated b,, the learner. 
Excluding packets of some special verbs to be 
discussed later and erroneous packets (due to a small 
amount of inconsistencies and incompleteness of the 
corpus and the extraction procedure), the packets 
were fed into the acquisition system (one packet for a 
verb). The average accuracy rate of the acquired argu- 
ment struct~ares was 0.86. An argument structure was 
counted as correct if it was unambiguous and con- 
firmed by the trainer. On average, for resolving ambi- 
guities, 113 queries were generated for every 100 suc- 
cessfully acquired argument structures. The packets 
from ATIS caused less ambiguities, since in this 
corpus there were many imperative sentences to 
which Impe:ative Heuristic may be applied. Volition 
Heuristic, Thematic Hierarchy Heuristic, and Preposi- 
tion Heuristic had almost equal frequencies of appli- 
cation in the experiment. 
As an. example of how the clues and heuristics 
could successfully derive argument structures of 
verbs, consider the sentence from ATIS: 
"The flight going to San Francisco ...". 
Without issuing any queries, the learner concluded 
that an argument structure of "go" is "{Th}, {Go}" 
This was because, according to the clues, "San Fran- 
cisco" couM only be Goal, while according to One- 
Theme Heuristic, "the flight" was recognized as 
Theme. Most argument structures were acquired 
using 1 to ~ queries. 
The result showed that, after (manually or 
automatically) acquiring an argument packet (i.e. a 
syntactic s t, bcategorization frame plus the syntactic 
constituent l 3f the external argument) of a verb, the 
acquisition~'rnethod could be invoked to upgrade the 
syntactic knowledge to thematic knowledge by issu- 
ing only 113 queries for every 100 argument packets. 
Since checking the validity of the generated sentences 
is not a heavy burden for the trainer (answering 'yes' 
247 
or 'no' only), the method may be attached to various 
systems for promoting incremental extensibility of 
thematic knowledge. 
The way of counting the accuracy rate of the 
acquired argument structures deserves notice. Failed 
cases were mainly due to the clues and heuristics that 
were too strong or overly committed. For example, 
the thematic role of "the man" in (4.1) from MARI 
could not be acquired using the clues and heuristics. 
(4.1) Laura ran away with the man. 
In the terminology of Gruber76, this is an expression 
of accompaniment which is not considered in the 
clues and heuristics. As another example, consider 
(4.2) also from MARI. 
(4.2) The greater Boston area ranked eight among 
major cities for incidence of AIDS. 
The clues and heuristics could not draw any conclu- 
sions on the possible thematic roles of "eight". 
On the other hand, the cases cour.ted as "failed" 
did not always lead to "erroneous" argument struc- 
tures. For example, "Mary" in (2.9) "John promised 
Mary to marry her" was treated as Theme rather than 
Goal, because "Mary" is the only possible Theme. 
Although "Mary" may be Theme in this case as well, 
treating "Mary" as Goal is more f'me-grained. 
The clues and heuristics may often lead to 
acceptable argument structures, even if the argument 
structures are inherently ambiguous. For example, an 
NP might function as more than one thematic role 
within a sentence (Jackendoff87). Ia (4.3), "John" 
may be Agent or Source. 
(4.3) John sold Mary a coat. 
Since Thematic Hierarchy Heuristic assumes that sub- 
jects and objects cannot reside at the same level, 
"John" must not be assigned as Sotuce. Therefore, 
"John" and "Mary" are assigned as Agent and Goal 
respectively, and the ambiguity is resolved. 
In addition, some thematic roles may cause 
ambiguities if only syntactic evidences are available. 
Experiencer, such as "John" in (4.4), arid Maleficiary, 
such as "Mary" in (4.5), are the two examples. 
(4.4) Mary surprised John. 
(4.5) Mary suffers a headache. 
There are difficulties in distinguishing Experiencer, 
Agent, Maleficiary and Theme. Fortunately, the verbs 
with Experiencer and Maleficiary may be enumerated 
before learning. Therefore, the argumen,: structures of 
these verbs are manually constructed rather than 
learned by the proposed method. 
5. RELATED WORK 
To explore the acquisition of domain-independent 
semantic knowledge, the universal linguistic con- 
straints postulated by many linguistic studies may 
provide gefieral (and perhaps coarse-grained) hints. 
The hints may be integrated with domain-specific 
semantic bias for various applications as well. In the 
branch of Lhe study, GB theory (Chomsky81) and 
universal feature instantiation principles (Gazdar85) 
had been shown to be applicable in syntactic 
knowledge ,.cquisition (Berwick85, Liu92a, Liu92b). 
The proposed method is closely related to those 
methodolog,.es. The major difference is that, various 
thematic theories are selected and computationalized 
for thematic knowledge acquisition. The idea of 
structural patterns in Montemagni92 is similar to 
Preposition Heuristic in that the patterns suggest gen- 
eral guidance to information extraction. 
Extra information resources are needed for 
thematic knawledge acquisition. From the cognitive 
point of view, morphological, syntactic, semantic, 
contextual (Jacobs88), pragmatic, world knowledge, 
and observations of the environment (Webster89, 
Siskind90) .~e all important resources. However, the 
availability~of the resources often deteriorated the 
feasibility of learning from a practical standpoint. 
The acquisition often becomes "circular" when rely- 
ing on semantic information to acquire target seman- 
tic informatmn. 
Prede~:ined domain linguistic knowledge is 
another important information for constraining the 
hypothesis ,space in learning (or for semantic 
bootstrapping). From this point of view, lexical 
categories (Zernik89, Zemik90) and theory of lexical 
semantics (Pustejovsky87a, Pustejovsky87b) played 
similar role~ as the clues and heuristics employed in 
this paper. The previous approaches had demon- 
strated the¢::etical interest, but their performance on 
large-scale acquisition was not elaborated. We feel 
that, requ~,ng the system to use available resources 
only (i.e, .,;yntactic processors and/or syntactically 
processed c'orpora) may make large-scale implemen- 
tations more feasible. The research investigates the 
issue as to l what extent an acquisition system may 
acquire thematic knowledge when only the syntactic 
resources a:e available. 
McClelland86 showed a connectionist model 
for thematic role assignment. By manually encoding 
training ass!gnments and semantic microfeatures for a 
limited number of verbs and nouns, the connectionist 
network learned how to assign roles. Stochastic 
approaches (Smadja91, Sekine92) also employed 
available corpora to acquire collocational data for 
resolving ambiguities in parsing. However, they 
acquired numerical values by observing the whole 
248, 
training corpus (non-incremental learning). Explana- 
tion for those numerical values is difficult to derive in 
those models. As far as the large-scale thematic 
knowledge acquisition is concerned, the incremental 
extensibility of the models needs to be further 
improved. 
6. CONCLUSION 
Preliminary syntactic analysis could be achieved by 
many natural language processing systems. Toward 
semantic interpretation on input sentences, thematic 
lexical knowledge is needed. Although each lexicon 
may have its own idiosyncratic thematic requirements 
on arguments, there exist syntactic clues for 
hypothesizing the thematic roles of the arguments. 
Therefore, exploiting the information derived from 
syntactic analysis to acquire thematic knowledge 
becomes a plausible way to build an extensible 
thematic dictionary. In this paper, various syntactic 
clues are integrated to hypothesize thematic roles of 
arguments in training sentences. Heuristics-guided 
ambiguity resolution is invoked to collect extra 
discrimination information from the nainer or the 
corpus. As more syntactic resources become avail- 
able, the method could upgrade the acquired 
knowledge from syntactic level to thematic level. 
Acknowledgement 
This research is supported in part by NSC (National 
Science Council of R.O.C.) under the grant NSC82- 
0408-E-007-029 and NSC81-0408-E007-19 from 
which we obtained the PENN TreeBank by Dr. 
Hsien-Chin Liou. We would like to thank the 
anonymous reviewers for their helpful comments. 

References 
\[Asker92\] Asker L., Gamback B., Samuelsson C., 
EBL2 : An Application to Automatic Lezical Acquisi- 
tion, Proc. of COLING, pp. 1172-1176, 1992. 
\[Berwick85\] Berwick R. C., The Acquisition of Syn- 
tactic Knowledge, The MIT Press, Cambridge, Mas- 
sachusetts, London, England, 1985. 
\[Brent91\] Brent M. R., Automatic Acquisition of Sub- 
categorization Frames from Untagged Text, Proc. of 
the 29th annual meeting of the ACL, pp. 209-214, 
1991. 
\[Chomsky81\] Chomsky N., Lectures or Government 
and Binding, Foris Publications - Dordrecht, 1981. 
\[Gazdar85\] Gazdar G., Klein E., Pullum G. K., and 
Sag I. A., Generalized Phrase Struc;ure Grammar, 
Harvard University Press, Cambridge Massachusetts, 
1985. 
\[Grimshaw88\] Grimshaw J. and Mester A., Light 
Verbs and Theta-Marking, Linguistic Inquiry, Vol. 
19, No. 2, pp. 205-232, 1988. 
\[Grishman92a\] Grishman R., Macleod C., and Ster- 
ling J., Evaluating Parsing Strategies Using Stand- 
ardized Parse Files, Proc. of the Third Applied NLP, 
pp. 156-161, 1992. 
\[Grishman92b\] Grishman R. and Sterling J., Acquisi- 
tion of Selec tional Patterns, Proc. of COLING-92, pp. 
658-664, 1992. 
\[Gruber76\] .Gruber J. S., Lexical Structures in Syntax 
and Semantics, North-Holland Publishing Company, 
1976. 
\[Jackendoff72\] Jackendoff R. S., Semantic Interpreta- 
tion in Generative Grammar, The MIT Press, Cam- 
bridge, Massachusetts, 1972. 
\[Jackendoff87\] Jackendoff R. S., The Status of 
Thematic Relations in Linguistic Theory, Linguistic 
Inquiry, VoL 18, No. 3, pp.369-411, 1987. 
\[Jacobs88\] Jacobs P. and Zernik U., Acquiring Lexi- 
cal Knowledge from Text: A Case Study, Proc. of 
AAAI, pp. 739-744, 1988. 
\[Lang88\] Lang F.-M. and Hirschman L., Improved 
Portability ~nd Parsing through Interactive Acquisi- 
tion of Semantic Information, Proc. of the second 
conference on Applied Natural Language Processing, 
pp. 49-57, ~988. 
\[-Levin86\] Lzvin B. and Rappaport M., The Formation 
of Adjectival Passives, Linguistic Inquiry, Vol. 17, 
No. 4, pp. 623-661, 1986. 
\[Liu92a\] L.ia R.-L. and Soo V.-W., Augmenting and 
Efficiently Utilizing Domain Theory in Explanation- 
Based Nat~.ral Language Acquisition, Proc. of the 
Ninth International Machine Learning Conference, 
ML92, pp. 282-289, 1992. 
\[Liu92b\] Liu R.-L and Soo V.-W., Acquisition of 
Unbounded Dependency Using Explanation-Based 
Learning, Froc. of ROCLING V, 1992. 
\[Liu93\] Li~a R.-L. and Soo V.-W., Parsing-Driven 
Generalization for Natural Language Acquisition, 
International Journal of Pattern Recognition and 
Artificial Intelligence, Vol. 7, No. 3, 1993. 
\[Lu89\] Lu R., Liu Y., and Li X., Computer-Aided 
Grammar Acquisition in the Chinese Understanding 
System CC!~AGA, Proc. of UCAI, pp. I550-I555, 
1989. 
\[Lytinen90\] Lytinen S. L. and Moon C. E., A Com- 
parison of Learning Techniques in Second Language 
Learning, \]r roc. of the 7th Machine Learning confer- 
ence, pp. 317-383, 1990. 
\[McClelland86\] McClelland J. L. and Kawamoto A. 
H., Mechanisms of Sentence Processing: Assigning 
Roles to Constituents of Sentences, in Parallel Distri- 
buted Processing, Vol. 2, pp. 272-325, 1986. 
\[Montemagni92\] Montemagni S. and Vanderwende 
L., Structural Patterns vs. String Patterns for Extract- 
ing Semantic Information from Dictionary, Proc. of 
COLING-92, pp. 546-552, 1992. 
\[Pustejovsky87a\] Pustejovsky J. and Berger S., The 
Acquisition of Conceptual Structure for the Lexicon, 
Proc. of AAM, pp. 566-570, 1987. 
\[Pustejovsky87b\] Pustejovsky J, On the Acquisition of 
Lexical Entries: The Perceptual Origin of Thematic 
Relation, Proc. of the 25th annual meeting of the 
ACL, pp. 172-178, 1987. 
\[Samuelsson91\] Samuelsson C. and Rayner M., 
Quantitative Evaluation of Explanation-Based Learn- 
ing as an Optimization Tool for a Large-Scale 
Natural Language System, Proc. of IJCAI, pp. 609- 
615, 1991. 
\[Sanfilippo92\] Sanfilippo A. and Pozanski V., The 
Acquisition of Lexical Knowledge from Combined 
Machine-Readable Dictionary Sources, Proc. of the 
Third Conference on Applied NLP, pp. 80-87, 1992. 
\[Sekine92\] Sekine S., Carroll J. J., Ananiadou S., and 
Tsujii J., Automatic Learning for Semantic Colloca- 
tion, Proc. of the Third Conference on Applied NLP, 
pp. 104-110, 1992. 
\[Simmons91\] Simmons R. F. and Yu Y.-H., The 
Acquisition and Application of Context Sensitive 
Grammar for English, Proc. of the 29th annual meet- 
ing of the ACL, pp. 122-129, 1991. 
\[Siskind90\] Siskind J. M., Acquiring Core Meanings 
of Words, Represented as Jackendoff-style Concep- 
tual structures, from Correlated Streams of Linguistic 
and Non-linguistic Input, Proc. of the 28th annual 
meeting of the ACL, pp. 143-156, 1990. 
\[Smadja91\] Smadja F. A., From N-Grams to Colloca- 
tions: An Evaluation of EXTRACT, Proc. of the 29th 
annual meeting of the ACL, pp. 279-284, 1991. 
\[Tanenhaus89\] Tanenhaus M. K. and Carlson G. N., 
Lexical Structure and Language Comprehension, in 
Lexical Representation and Process, William 
Marson-Wilson (ed.), The MIT Press, 1989. 
\[Taraban88\] Taraban R. and McClelland J. L., Consti- 
tuent Attachment and Thematic Role Assignment in 
Sentence Processing: Influences of Content-Based 
Expectations, Journal of memory and language, 27, 
pp. 597-632, 1988. 
\[Velardi91\] Velardi P., Pazienza M. T., and Fasolo 
M., How to Encode Semantic Knowledge: A Method 
for Meaning Representation and Computer-Aided 
Acquisition,~Computational Linguistic, Vol. 17, No. 2, 
pp. 153-17G~ 1991. 
\[Webster89\] I Webster M. and Marcus M., Automatic 
Acquisition of the Lexical Semantics of Verbs from 
Sentence Frames, Proc. of the 27th annual meeting of 
the ACL, pp. 177-184, 1989. 
\[Zernik89\] Zernik U., Lexicon Acquisition: Learning 
from Corpus by Capitalizing on Lexical Categories, 
Proc. of IJC&I, pp. 1556-1562, 1989. 
\[Zernik90\] Zernik U. and Jacobs P., Tagging for 
Learning: Collecting Thematic Relation from Corpus, 
Proc. of COLING, pp. 34-39, 1990. 
