Automatic Labeling of Semantic Roles
Daniel Gildea
University of California, Berkeley, and
International Computer Science Institute
gildea@cs.berkeley.edu
Daniel Jurafsky
Department of Linguistics
University of Colorado, Boulder
jurafsky@colorado.edu
Abstract
We present a system for identify-
ing the semantic relationships, or se-
mantic roles, #0Clledby constituents of
asentence within a semantic frame.
Various lexical and syntactic fea-
tures are derived from parse trees
and used to derive statistical clas-
si#0Cers from hand-annotated training
data.
1 Introduction
Identifying the semantic roles #0Clled by con-
stituents of a sentence can provide a level of
shallow semantic analysis useful in solving a
number of natural  processing tasks.
Semantic roles represent the participants in
an action or relationship captured by a se-
mantic frame. For example, the frame for one
sense of the verb #5Ccrash" includes the roles
Agent, Vehicle and To-Location.
This shallow semantic level of interpreta-
tion can be used for many purposes. Cur-
rent information extraction systems often use
domain-speci#0Cc frame-and-slot templates to
extract facts about, for example, #0Cnancial
news or interesting political events. A shal-
low semantic level of representation is a more
domain-independent, robust level of represen-
tation. Identifying these roles, for example,
could allow a system to determine that in
the sentence #5CThe #0Crst one crashed" the sub-
ject is the vehicle, but in the sentence #5CThe
#0Crst one crashed it" the subject is the agent,
whichwould help in information extraction in
this domain. Another application is in word-
sense disambiguation, where the roles associ-
ated with a word can be cues to its sense. For
example, Lapata and Brew #281999#29 and others
have shown that the di#0Berent syntactic sub-
catgorization frames of a verb like #5Cserve" can
be used to help disambiguate a particular in-
stance of the word #5Cserve". Adding seman-
tic role subcategorization information to this
syntactic information could extend this idea
to use richer semantic knowledge. Semantic
roles could also act as an important inter-
mediate representation in statistical machine
translation or automatic text summarization
and in the emerging #0Celd of Text Data Mining
#28TDM#29 #28Hearst, 1999#29. Finally, incorporat-
ing semantic roles into probabilistic models of
 should yield more accurate parsers
and better  models for speech recog-
nition.
This paper proposes an algorithm for au-
tomatic semantic analysis, assigning a se-
mantic role to constituents in a sentence.
Our approach to semantic analysis is to
treat the problem of semantic role labeling
like the similar problems of parsing, part of
speech tagging, and word sense disambigua-
tion. We apply statistical techniques that
have been successful for these tasks, including
probabilistic parsing and statistical classi#0Cca-
tion. Our statistical algorithms are trained
on a hand-labeled dataset: the FrameNet
database #28Baker et al., 1998#29. The FrameNet
database de#0Cnes a tagset of semantic roles
called frame elements, and includes roughly
50,000 sentences from the British National
Corpus which have been hand-labeled with
these frame elements. The next section de-
scribes the set of frame elements#2Fsemantic
roles used by our system. In the rest of this
paperwe report on our current system, as well
as a number of preliminary experiments on
extensions to the system.
2 Semantic Roles
Historically, twotypes of semantic roles have
been studied: abstract roles such as Agent
and Patient, and roles speci#0Cc to individual
verbs such as Eater and Eaten for #5Ceat".
The FrameNet project proposes roles at an in-
termediate level, that of the semantic frame.
Frames are de#0Cned as schematic representa-
tions of situations involving various partici-
pants, props, and other conceptual roles #28Fill-
more, 1976#29. For example, the frame #5Cconver-
sation", shown in Figure 1, is invoked bythe
semantically related verbs #5Cargue", #5Cbanter",
#5Cdebate", #5Cconverse", and #5Cgossip" as well
as the nouns #5Cargument", #5Cdispute", #5Cdiscus-
sion" and #5Cti#0B". The roles de#0Cned for this
frame, and shared by all its lexical entries,
include Protagonist1 and Protagonist2
or simply Protagonists for the participants
in the conversation, as well as Medium, and
Topic. Example sentences are shown in Ta-
ble 1. De#0Cning semantic roles at the frame
level avoids some of the di#0Eculties of at-
tempting to #0Cnd a small set of universal, ab-
stract thematic roles, or case roles such as
Agent, Patient, etc #28as in, among many
others, #28Fillmore, 1968#29 #28Jackendo#0B, 1972#29#29.
Abstract thematic roles can be thought of
as being frame elements de#0Cned in abstract
frames suchas#5Caction" and #5Cmotion" which
are at the top of in inheritance hierarchy of
semantic frames #28Fillmore and Baker, 2000#29.
The preliminary version of the FrameNet
corpus used for our experiments contained 67
frames from 12 general semantic domains cho-
sen for annotation. Examples of domains #28see
Figure 1#29 include #5Cmotion", #5Ccognition" and
#5Ccommunication". Within these frames, ex-
amples of a total of 1462 distinct lexical pred-
icates, ortarget words,were annotated: 927
verbs, 339 nouns, and 175 adjectives. There
are a total of 49,013 annotated sentences, and
99,232 annotated frame elements #28which do
not include the target words themselves#29.
3 Related Work
Assignment of semantic roles is an impor-
tant part of  understanding, and has
been attacked by many computational sys-
tems. Traditional parsing and understand-
ing systems, including implementations of
uni#0Ccation-based grammars such as HPSG
#28Pollard and Sag, 1994#29, rely on hand-
developed grammars which must anticipate
eachway in which semantic roles may be real-
ized syntactically. Writing such grammars is
time-consuming, and typically such systems
have limited coverage.
Data-driven techniques have recently been
applied to template-based semantic interpre-
tation in limited domains by #5Cshallow" sys-
tems that avoid complex feature structures,
and often perform only shallow syntactic
analysis. For example, in the context of
the Air Traveler Information System #28ATIS#29
for spoken dialogue, Miller et al. #281996#29 com-
puted the probability that a constituent such
as #5CAtlanta" #0Clled a semantic slot such as
Destination in a semantic frame for air
travel. In a data-driven approach to infor-
mation extraction, Rilo#0B #281993#29 builds a dic-
tionary of patterns for #0Clling slots in a spe-
ci#0Cc domain such as terrorist attacks, and
Rilo#0B and Schmelzenbach #281998#29 extend this
technique to automatically derive entire case
frames for words in the domain. These last
systems make use of a limited amount of hand
labor to accept or reject automatically gen-
erated hypotheses. They show promise for
a more sophisticated approach to generalize
beyond the relatively small number of frames
considered in the tasks. More recently, a do-
main independent system has been trained on
general function tags such as Manner and
Temporal by Blaheta and Charniak #282000#29.
4 Methodology
We divide the task of labeling frame elements
into two subtasks: that of identifying the
boundaries of the frame elements in the sen-
tences, and that of labeling each frame ele-
ment, given its boundaries, with the correct
role. We #0Crst give results for a system which
confer−v
debate−v
converse−v
gossip−v
dispute−n
discussion−n
tiff−n
ConversationFrame:
Protagonist−1
Protagonist−2
Protagonists
Topic
Medium
Frame Elements:
talk−v
Domain:
Communication Domain: Cognition
Frame:
Questioning
Topic
Medium
Frame Elements:
Speaker
Addressee
Message
Frame:
Topic
Medium
Frame Elements:
Speaker
Addressee
Message
Statement
Frame:
Frame Elements:
Judgment
Judge
Evaluee
Reason
Role
dispute−n
blame−v
fault−n
admire−v
admiration−n
disapprove−v
blame−n
appreciate−v
Frame:
Frame Elements:
Categorization
Cognizer
Item
Category
Criterion
Figure 1: Sample domains and frames from the FrameNet lexicon.
Frame Element Example #28in italics#29 with target verb Example #28in italics#29 with target noun
Protagonist 1 Kim argued with Pat Kim had an argument with Pat
Protagonist 2 Kim argued with Pat Kim had an argument with Pat
Protagonists Kim and Pat argued Kim and Pat had an argument
Topic Kim and Pat argued about politics Kim and Pat had an argument about politics
Medium Kim and Pat argued in French Kim and pat had an argument in French
Table 1: Examples of semantic roles, or frame elements, for target words #5Cargue" and #5Cargu-
ment" from the #5Cconversation" frame
labels roles using human-annotated bound-
aries, returning to the question of automat-
ically identifying the boundaries in Section
5.3.
4.1 Features Used in Assigning
Semantic Roles
Thesystem is a statistical one, based on train-
ing a classi#0Cer on a labeled training set, and
testing on an unlabeled test set. The sys-
tem is trained by #0Crst using the Collins parser
#28Collins, 1997#29 to parse the 36,995 train-
ing sentences, matching annotated frame el-
ements to parse constituents, and extracting
various features from the string of words and
the parse tree. During testing, the parser is
run on the test sentences and the same fea-
tures extracted. Probabilities for each possi-
ble semantic role r are then computed from
the features. The probability computation
will be described in the next section; the fea-
tures include:
Phrase Type: This feature indicates the
syntactic type of the phrase expressing
the semantic roles: examples include
nounphrase#28NP#29, verb phrase#28VP#29, and
clause #28S#29. Phrase types were derived au-
tomatically fromparse trees generated by
the parser, as shown in Figure 2. The
parse constituent spanning each set of
words annotated as a frame elementwas
found, and the constituent's nonterminal
label was taken as the phrase type. As
an example of how this feature is useful,
in communication frames, the Speaker
is likely appear a a noun phrase, Topic
as a prepositional phrase or noun phrase,
and Mediumas a prepostional phrase, as
in: #5CWe talked about the proposal over
the phone." When no parse constituent
was found with boundaries matching
those of a frame element during testing,
the largest constituent beginning at the
frame element's left boundary and lying
entirely within the element was used to
calculate the features.
Grammatical Function: This feature at-
tempts to indicate a constituent's syntac-
tic relation to the rest of the sentence,
S
NP
PRP
VP
VBD
NP
SBAR
IN
S
NNP
VP
VBD
NP PP
PRP IN
NP
NN
Goal SourceTheme Target
NP
He heard the sound of liquid slurping in a metal container as approached him from behindFarrell
Figure 2: A sample sentence with parser output #28above#29 and FrameNet annotation #28below#29.
Parse constituents corresponding to frame elements are highlighted.
for example as a subject or object of a
verb. As with phrase type, this feature
was read from parse trees returned by
the parser. After experimentation with
various versions of this feature, we re-
stricted it to apply only to NPs, as it was
found to have little e#0Bect on other phrase
types. Each NP's nearest S or VP ances-
tor was found in the parse tree; NPs with
an S ancestor were given the grammati-
cal function subject and those with a VP
ancestor were labeled object. In general,
agenthood is closely correlated with sub-
jecthood. For example, in the sentence
#5CHe drove the car over the cli#0B", the #0Crst
NP is more likely to #0Cll the Agent role
than the second or third.
Position: This feature simply indicates
whether the constituent to be labeled oc-
curs before or after the predicate de#0Cn-
ing the semantic frame. We expected
this feature to be highly correlated with
grammatical function, since subjects will
generally appear before a verb, and
objects after. Moreover, this feature
mayovercome the shortcomings of read-
ing grammatical function from a con-
stituent's ancestors in the parse tree, as
well as errors in the parser output.
Voice: The distinction between active and
passive verbs plays an important role
in the connection between semantic role
and grammatical function, since direct
objects of activeverbs correspond to sub-
jects of passive verbs. From the parser
output, verbs were classi#0Ced as active or
passive by building a set of 10 passive-
identifying patterns. Each of the pat-
terns requires both a passive auxiliary
#28some form of #5Cto be" or #5Cto get"#29 and a
past participle.
Head Word: As previously noted, we ex-
pected lexical dependencies to be ex-
tremely important in labeling semantic
roles, as indicated by their importance
in related tasks such as parsing. Since
the parser used assigns each constituent
a head word as an integral part of the
parsing model, we were able to read the
head words of the constituents from the
parser output. For example, in a commu-
nication frame, noun phrases headed by
#5CBill", #5Cbrother", or #5Che" are more likely
to be the Speaker, while those headed
by #5Cproposal", #5Cstory", or #5Cquestion" are
more likely to be the Topic.
For our experiments, we divided the
FrameNet corpus as follows: one-tenth of the
annotated sentences for each target word were
reserved as a test set, and another one-tenth
were set aside as a tuning set for developing
our system. A few target words with fewer
than ten examples were removed fromthe cor-
pus. In our corpus, the average number of
sentences per target word is only 34, and the
number of sentences per frame is 732 | both
relatively small amounts of data on whichto
train frame element classi#0Cers.
Although we expect our features to inter-
act in various ways, the data are too sparse
to calculate probabilities directly on the full
set of features. For this reason, we built our
classi#0Cer bycombining probabilities from dis-
tributions conditioned on a variety of combi-
nations of features.
An importantcaveat in using the FrameNet
database is that sentences are not chosen for
annotation at random, and therefore are not
necessarily statistically representative of the
corpus as a whole. Rather, examples are cho-
sen to illustrate typical usage patterns for
each word. We intend to remedy this in fu-
ture versions of this work by bootstrapping
our statistics using unannotated text.
Table 2 shows the probability distributions
used in the #0Cnal version of the system. Cov-
erage indicates the percentage of the test data
for which the conditioning event had been
seen in training data. Accuracy is the propor-
tion of covered test data for which the correct
role is predicted, and Performance, simply
the product of coverage and accuracy, is the
overall percentage of test data for which the
correct role is predicted. Accuracy is some-
what similar to the familiar metric of pre-
cision in that it is calculated over cases for
which a decision is made, and performance is
similar to recall in that it is calculated over all
true frame elements. However, unlike a tradi-
tional precision#2Frecall trade-o#0B, these results
have no threshold to adjust, and the task is a
multi-way classi#0Ccation rather than a binary
decision. The distributions calculated were
simply the empirical distributions from the
training data. That is, occurrences of each
role and each set of conditioning events were
counted in a table, and probabilities calcu-
lated by dividing the counts for each role by
the total number of observations for each con-
ditioning event. For example, the distribution
P#28rjpt;t#29was calculated sas follows:
P#28rjpt;t#29=
#23#28r;pt;t#29
#23#28pt;t#29
Some sample probabilities calculated from
the training are shown in Table 3.
5 Results
Results for di#0Berent methods of combining
the probability distributions described in the
previous section are shown in Table 4. The
linear interpolation method simply averages
the probabilities given by each of the distri-
butions in Table 2:
P#28rjconstituent#29 = #15
1
P#28rjt#29 +
#15
2
P#28rjpt;t#29 + #15
3
P#28rjpt;gf;t#29 +
#15
4
P#28rjpt;position;voice#29 +
#15
5
P#28rjpt;position;voice;t#29 + #15
6
P#28rjh#29 +
#15
7
P#28rjh;t#29+#15
8
P#28rjh;pt;t#29
where
P
i
#15
i
=1.The geometric mean, ex-
pressed in the log domain, is similar:
P#28rjconstituent#29 =
1
Z
expf#15
1
logP#28rjt#29 +
#15
2
logP#28rjpt;t#29 + #15
3
logP#28rjpt;gf;t#29 +
#15
4
logP#28rjpt;position;voice#29 +
#15
5
logP#28rjpt;position;voice;t#29 +
#15
6
logP#28rjh#29 + #15
7
logP#28rjh;t#29 +
#15
8
logP#28rjh;pt;t#29g
where Z is a normalizing constant ensuring
that
P
r
P#28rjconstituent#29=1.
The results shown in Table 4 re#0Dect equal
values of #15 for each distribution de#0Cned for
the relevant conditioning event #28but exclud-
ing distributions for which the conditioning
event was not seen in the training data#29.
Distribution Coverage Accuracy Performance
P#28rjt#29 100#25 40.9#25 40.9#25
P#28rjpt;t#29 92.5 60.1 55.6
P#28rjpt;gf;t#29 92.0 66.6 61.3
P#28rjpt;position;voice#29 98.8 57.1 56.4
P#28rjpt;position;voice;t#29 90.8 70.1 63.7
P#28rjh#29 80.3 73.6 59.1
P#28rjh;t#29 56.0 86.6 48.5
P#28rjh;pt;t#29 50.1 87.4 43.8
Table 2: Distributions Calculated for Semantic Role Identi#0Ccation: r indicates semantic role,
pt phrase type, gf grammatical function, h head word, and t target word, or predicate.
P#28rjpt;gf;t#29 Count in training data
P#28r =Agtjpt =NP;gf =Subj;t=abduct#29 = :46 6
P#28r =Thmjpt =NP;gf =Subj;t=abduct#29 = :54 7
P#28r =Thmjpt =NP;gf =Obj;t=abduct#29 = 1 9
P#28r =Agtjpt =PP;t=abduct#29 = :33 1
P#28r =Thmjpt =PP;t=abduct#29 = :33 1
P#28r =CoThmjpt =PP;t=abduct#29 = :33 1
P#28r =Manrjpt =ADVP;t=abduct#29 = 1 1
Table 3: Sampleprobabilities for P#28rjpt;gf;t#29 calculated fromtraining data for the verb abduct.
The variable gf is only de#0Cned for noun phrases. The roles de#0Cned for the removing frame in
the motion domain are: Agent, Theme, CoTheme #28#5C... had been abducted with him"#29 and
Manner.
Other schemes for choosing values of #15, in-
cluding giving more weight to distributions
for which more training data was available,
were found to have relatively little e#0Bect. We
attribute this to the fact that the evaluation
depends only the the ranking of the probabil-
ities rather than their exact values.
P(r | h, t)
P(r | pt, t)
P(r | pt, position, voice)
P(r | pt, position, voice, t)P(r | pt, gf, t)
P(r | t)P(r | h)
P(r | h, pt, t)
Figure 3: Lattice organization of the distri-
butions from Table 2, with more speci#0Cc dis-
tributions towards the top.
In the #5Cbacko#0B" combination method, a
lattice was constructed over the distributions
in Table 2 from more speci#0Cc conditioning
events to less speci#0Cc, as shown in Figure
3. The less speci#0Cc distributions were used
only when no data was present for any more
speci#0Cc distribution. As before, probabilities
were combined with both linear interpolation
and a geometric mean.
Combining Method Correct
Linear Interpolation 79.5#25
Geometric Mean 79.6
Backo#0B, linear interpolation 80.4
Backo#0B, geometric mean 79.6
Baseline: Most common role 40.9
Table 4: Results on Development Set, 8148
observations
The #0Cnal system performed at 80.4#25 ac-
curacy, which can be compared to the 40.9#25
achieved by always choosing the most prob-
able role for each target word, essentially
chance performance on this task. Results for
this system on test data, held out during de-
velopment of the system, are shown in Table
Linear
Backo#0B Baseline
Development Set 80.4#25 40.9#25
Test Set 76.9 40.6#25
Table 5: Results on Test Set, using backo#0B
linear interpolation system. The test set con-
sists of 7900 observations.
5.
5.1 Discussion
It is interesting to note that looking at a con-
stituent's position relative to the target word
along with active#2Fpassive information per-
formed as well as reading grammatical func-
tion o#0B the parse tree. A system using gram-
matical function, along with the head word,
phrase type, and target word, but no passive
information, scored 79.2#25. A similar system
using position rather than grammatical func-
tion scored 78.8#25 | nearly identical perfor-
mance. However, using head word, phrase
type, and target word without either position
or grammatical function yielded only 76.3#25,
indicating that while the two features accom-
plish a similar goal, it is important to include
some measure of the constituent's syntactic
relationship to the target word. Our #0Cnal sys-
tem incorporated both features, giving a fur-
ther, thoughnot signi#0Ccant, improvement. As
a guideline for interpreting these results, with
8176 observations, the threshold for statisti-
cal signifance with p#3C:05 is a 1.0#25 absolute
di#0Berence in performance.
Use of the active#2Fpassive feature made a
further improvement: our system using po-
sition but no grammatical function or pas-
sive information scored 78.8#25; adding passive
information brought performance to 80.5#25.
Roughly 5#25 of the examples were identi#0Ced
as passive uses.
Head words proved to be very accurate in-
dicators of a constituent's semantic role when
data was available for a given head word,
con#0Crming the importance of lexicalization
shown in various other tasks. While the dis-
tribution P#28rjh;t#29 can only be evaluated for
56.0#25 of the data, of those cases it gets 86.7#25
correct, without use of any of the syntactic
features.
5.2 Lexical Clustering
In order to address the sparse coverage of lex-
ical head word statistics, an experiment was
carried out using an automatic clustering of
head words of the type described in #28Lin,
1998#29. A soft clustering of nouns was per-
formed by applying the co-occurrence model
of #28Hofmann and Puzicha, 1998#29 to a large
corpus of observed direct object relationships
between verbs and nouns. The clustering was
computed from an automatically parsed ver-
sion of the British National Corpus, using the
parser of #28Carroll and Rooth, 1998#29. The ex-
perimentwas performed using only frame el-
ements with a noun as head word. This al-
lowed a smoothed estimate of P#28rjh;nt;t#29to
be computed as
P
c
P#28rjc;nt;t#29P#28cjh#29, sum-
ming over the automatically derived clusters c
to which a nominalhead word h might belong.
This allows the use of head word statistics
even when the headword h has not been seen
in conjunction was the target word t in the
training data. While the unclustered nominal
head word feature is correct for 87.6#25 of cases
where data for P#28rjh;nt;t#29 is available, such
data was available for only 43.7#25 of nominal
head words. The clustered head word alone
correctly classi#0Ced 79.7#25 of the cases where
the head word was in the vocabulary used
for clustering; 97.9#25 of instances of nominal
head words were in the vocabulary. Adding
clustering statistics for NP constituents into
the full system increased overall performance
from 80.4#25 to 81.2#25.
5.3 Automatic Identi#0Ccation of
Frame Element Boundaries
The experiments described above have used
human annotated frame element boundaries
| here we address how well the frame ele-
ments can be found automatically. Exper-
iments were conducted using features simi-
lar to those described above to identify con-
stituents in a sentence's parse tree that were
likely to be frame elements. The system
was given the human-annotated target word
and the frame as inputs, whereas a full lan-
guage understanding system would also iden-
tify which frames come into play in a sen-
tence | essentially the task of word sense
disambiguation. The main feature used was
the path from the target word through the
parse tree to the constituent in question, rep-
resented as a string of parse tree nonterminals
linked bysymbols indicating upward or down-
ward movement through the tree, as shown in
Figure 4.
S
NP
VP
V
NP
Det
N
Pro
He
ate
some
target
word
frame
element
pancakes
Figure 4: In this example, the path from the
frame element #5CHe" to the target word #5Cate"
can be represented as NP " S #23 VP #23 V, with
" indicating upward movement in the parse
tree and #23 downward movement.
The other features used were the iden-
tity of the target word and the identity of
the constituent's head word. The probabil-
ity distributions calculated from the train-
ing data were P#28fejpath#29, P#28fejpath;t#29, and
P#28fejh;t#29, where feindicates an eventwhere
the parse constituent in question is a frame el-
ement, path the path through the parse tree
from the target word to the parse constituent,
t the identity of the target word, and h the
head word of the parse constituent. By vary-
ing the probability threshold at which a deci-
sion is made, one can plot a precision#2Frecall
curve as shown in Figure 5. P#28fejpath;t#29
performs relatively poorly due to fragmenta-
tion of the training data #28recall only about 30
sentences are available for each target word#29.
While the lexical statistic P#28fejh;t#29 alone is
not useful as a classi#0Cer, using it in linear in-
terpolation with the path statistics improves
results. Note that this method can only iden-
tify frame elements that have a correspond-
ing constituent in the automatically gener-
ated parse tree. For this reason, it is inter-
esting to calculate how many true frame el-
ements overlap with the results of the sys-
tem, relaxing the criterion that the bound-
aries must match exactly. Results for partial
matching are shown in Table 6.
When the automatically identi#0Ced con-
stituents were fed through the role labeling
system described above, 79.6#25 of the con-
stituents which had been correctly identi#0Ced
in the #0Crst stage were assigned the correct role
in the second, roughly equivalent to the per-
formance when assigning roles to constituents
identi#0Ced by hand.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
recall
precision
P(fe|path)
P(fe|path, t)
.75*P(fe | path)+.25*P(fe | h, t)
Figure 5: Precison#2FRecall plot for various
methodsof identifying frameelements. Recall
is calculated over only frame elements with
matching parse constituents.
6 Conclusion
Our preliminary system is able to automati-
cally label semantic roles with fairly high ac-
curacy, indicating promise for applications in
various natural  tasks. Lexical statis-
tics computedon constituentheadwords were
found to be the most important of the fea-
tures used. While lexical statistics are quite
accurate on the data covered by observations
in the training set, the sparsity of the data
when conditioned on lexical items meant that
combining features was the key to high over-
all performance. While the combined sys-
tem was far more accurate than any feature
Type of Overlap Identi#0Ced Constituents Number
Exactly Matching Boundaries 66#25 5421
Identi#0Ced constituent entirely within true frame element 8 663
True frame elemententirely within identi#0Ced constituent 7 599
Partial overlap 0 26
No match to true frame element 13 972
Table 6: Results on Identifying Frame Elements #28FEs#29, including partial matches. Results
obtained using P#28fejpath#29 with threshold at .5. A total of 7681 constituents were identi#0Ced as
FEs, 8167 FEs were present in hand annotations, of which matching parse constituents were
present for 7053 #2886#25#29.
taken alone, the speci#0Cc method of combina-
tion used was less important.
We plan to continue this work byintegrat-
ing semantic role identi#0Ccation with parsing,
by bootstrapping the system on larger, and
more representative, amounts of data, and by
attempting to generalize from the set of pred-
icates chosen by FrameNet for annotation to
general text.
References
Collin F. Baker, Charles J. Fillmore, and John B.
Lowe. 1998. The berkeley framenet project.
In Proceedings of the COLING-ACL, Montreal,
Canada.
Dan Blaheta and Eugene Charniak. 2000. As-
signing function tags to parsed text. In Pro-
ceedings of the 1st Annual Meeting of the North
American Chapter of the ACL #28NAACL#29, Seat-
tle, Washington.
Glenn Carroll and Mats Rooth. 1998. Va-
lence induction with a head-lexicalizedpcfg. In
Proceedings of the 3rd Conference on Empir-
ical Methods in Natural Language Processing
#28EMNLP 3#29, Granada, Spain.
Michael Collins. 1997. Three generative, lexi-
calised models for statistical parsing. In Pro-
ceedings of the 35th Annual Meeting of the
ACL.
Charles J. Fillmore and Collin F. Baker. 2000.
Framenet: Frame semantics meets the corpus.
In Linguistic Society of America, January.
Charles Fillmore. 1968. The case for case. In
Bach and Harms, editors, Universals in Lin-
guistic Theory, pages 1#7B88. Holt, Rinehart, and
Winston, New York.
Charles J. Fillmore. 1976. Frame semantics
and the nature of . In Annals of the
New York Academy of Sciences: Conferenceon
the Origin and Development of Language and
Speech,volume 280, pages 20#7B32.
Marti Hearst. 1999. Untanglingtext data mining.
In Proceedings of the 37rd Annual Meeting of
the ACL.
Thomas Hofmann and Jan Puzicha. 1998. Sta-
tistical models for co-occurrence data. Memo,
Massachussetts Institute of Technology Arti#0C-
cial Intelligence Laboratory,February.
RayJackendo#0B. 1972. Semantic Interpretation in
Generative Grammar. MIT Press, Cambridge,
Massachusetts.
Maria Lapata and Chris Brew. 1999. Using
subcategorization to resolveverb class ambigu-
ity. In Joint SIGDAT Conference on Empiri-
cal Methods in NLP and Very Large Corpora,
Maryland.
Dekang Lin. 1998. Automatic retrieval and clus-
tering of similar words. In Proceedings of the
COLING-ACL, Montreal, Canada.
Scott Miller, David Stallard, Robert Bobrow, and
Richard Schwartz. 1996. A fully statistical
approach to natural  interfaces. In
Proceedings of the 34th Annual Meeting of the
ACL.
Carl Pollard and Ivan A. Sag. 1994. Head-
Driven Phrase StructureGrammar. University
of Chicago Press, Chicago.
Ellen Rilo#0B and Mark Schmelzenbach. 1998. An
empiricalapproachto conceptualcaseframeac-
quisition. In Proceedings of the Sixth Workshop
on Very Large Corpora.
Ellen Rilo#0B. 1993. Automatically constructing
a dictionary for information extraction tasks.
In Proceedings of the Eleventh National Con-
ferenceonArti#0Ccial Intelligence #28AAAI#29.
