Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, pages 57–66,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Frame Semantic Enhancement of Lexical-Semantic Resources
Rebecca Green
Institute of Advanced Computer Studies
College of Information Studies
University of Maryland
College Park, MD 20742, USA
rgreen@umd.edu
Bonnie J. Dorr
Institute of Advanced Computer Studies
Department of Computer Science
University of Maryland
College Park, MD 20742, USA
bonnie@umiacs.umd.edu
Abstract
SemFrame generates FrameNet-like
frames, complete with semantic roles and
evoking lexical units. This output can
enhance FrameNet by suggesting new
frames, as well as additional lexical units
that evoke existing frames. SemFrame
output can also support the addition of
frame semantic relationships to WordNet.
1 Introduction
The intuition that semantic analysis can make a posi-
tive contribution to language-based applications has
motivated the development of a number of lexical-
semantic resources. Prominent among them are
WordNet,1 PropBank,2 and FrameNet.3 The poten-
tial contribution of these resources is constrained by
the information they contain and the level of effort
involved in their development.
For example, semantic annotation tasks (Baker et
al., 2004) typically assign semantic roles to the ar-
guments of predicates. The benefit of the semantic
annotation is constrained by the presence and quality
of semantic roles in the lexical-semantic resource(s)
used. Gildea and Jurafsky (2002) suggest that the
availability of semantic annotation of this sort is use-
ful for information extraction, word sense disam-
biguation, machine translation, text summarization,
text mining, and speech recognition.
1http://www.cogsci.princeton.edu/˜wn
2http://www.cis.upenn.edu/˜ace
3http://framenet.icsi.berkeley.edu
Other tasks rely on the identification of seman-
tic relationships to recognize lexical chains (sets of
semantically related words that enable a text to be
cohesive) (Morris and Hirst, 1991). The success
of this work is constrained by the set of semantic
relationship types and instantiations underlying the
recognition of lexical chains. As Stokes’s disser-
tation (2004) notes, lexical cohesion has been used
in discourse analysis, text segmentation, word sense
disambiguation, text summarization, topic detection
and tracking, and question answering.
Unfortunately, most lexical-semantic resources,
including those previously mentioned, are the prod-
uct of considerable ongoing human effort. Given
the high development costs associated with these re-
sources, the possibility of enhancing them on the
basis of complementary resources that are produced
automatically is welcome.
This paper demonstrates several of the character-
istics and benefits of SemFrame (Green et al., 2004;
Green and Dorr, 2004), a system that produces such
a resource.
1. SemFrame generates semantic frames in a form
like those of FrameNet, the ostensible gold
standard for semantic frames.
2. Some SemFrame frames correspond to
FrameNet frames. When SemFrame identifies
additional lexical units that evoke the frame,
it bolsters the use of semantic frames for
identifying lexical chains.
3. Some SemFrame frames cover semantic space
not yet investigated in FrameNet, which, be-
57
cause of the labor-intensive nature of its de-
velopment, is incomplete. The identification of
new frames thus helps fill in gaps in FrameNet.
4. In addition to complementing FrameNet, Sem-
Frame could be used as a more systematic
source of semantic roles for PropBank or could
serve as the basis for adding frame semantic re-
lationships to WordNet.
The rest of the paper is organized as follows:
Section 2 discusses lexical-semantic resources that
could be enhanced by using SemFrame’s output.
Section 3 sets out how SemFrame works, with Sub-
sections 3.1 and 3.2 explaining, respectively, the
identification of lexical units that evoke shared se-
mantic frames and the generation of the internal
structure of those frames. Section 4 discusses how
we evaluate SemFrame’s output. Finally, Section 5
summarizes SemFrame’s contributions and sketches
future directions in its development.
2 Lexical-Semantic Resources
Lexical-semantic resources, such as FrameNet and
PropBank, which involve semantic frames and/or
semantic roles, are one kind of resource that Sem-
Frame’s output can enhance. SemFrame could also
benefit a resource like WordNet that captures differ-
ent kinds of semantic relationships. Here we discuss
characteristics of these resources that make them
amenable to enhancement through SemFrame.
2.1 FrameNet
FrameNet documents the semantic and syntactic be-
havior of words with respect to frames. A frame
characterizes a conventional conceptual structure,
for instance, a situation involving risk, a hitting
event, a commercial transaction. Lexical units are
said to evoke a frame. For example, use of the literal
sense of buy introduces into a discourse an expecta-
tion that some object or service (the Goods) passes
from one person (the Seller) to another (the Buyer)
in exchange for something of (presumably equiva-
lent) value (typically Money).
A significant contribution of the FrameNet project
is the creation of frames, which involves the enumer-
ation both of participant roles in the frame (a.k.a,
frame elements, frame slots) and of lexical units that
evoke the frame. As of May 2005, 657 frames have
been defined in FrameNet; approximately 8600 lex-
ical unit/frame associations have been made.
FrameNet’s approach to identifying frames is
“opportunistic” and driven by the corpus data being
annotated. Thus the FrameNet team does not ex-
pect to have a full inventory of frames until a sub-
stantial proportion of the general-purpose vocabu-
lary of English has been analyzed. As the develop-
ment of FrameNet is labor-intensive, supplementing
FrameNet’s frames and evoking lexical units using
data from SemFrame would be beneficial.
2.2 PropBank
Like FrameNet, PropBank (Kingsbury et al., 2002)
is a project aimed at semantic annotation, in this
case of the Penn English Treebank.4 The intent of
PropBank is to provide for “automatic extraction of
relational data” on the basis of consistent labeling
of predicate argument relationships. Typically the
labels/semantic roles are verb-specific (but are of-
ten standardized across synonyms). For example,
the set of semantic arguments for promise, pledge,
etc. (its ‘roleset’) includes the promiser, the person
promised to, and the promised thing or action. These
correspond respectively to FrameNet’s Speaker, Ad-
dressee, and Message elements within the Commit-
ment frame.
The more general labels used in FrameNet and
SemFrame give evidence of a more systematic ap-
proach to semantic argument structure, more eas-
ily promoting the discovery of relationships among
frames. It can be seen from the terminology used
that PropBank is more focused on the individual ar-
guments of the semantic argument structure, while
FrameNet and SemFrame are more focused on the
overall gestalt of the argument structure, that is, the
frame. The use of FrameNet and SemFrame to sug-
gest more generic (that is, frame-relevant) roleset la-
bels would help move PropBank toward greater sys-
tematicity.
4The semantic annotation tasks in the FrameNet and Prop-
Bank projects enable them to link semantic roles and syntactic
behavior. Enhancing and stabilizing its semantic frame inven-
tory must precede the inclusion of such linkage in SemFrame.
58
2.3 WordNet
WordNet is a lexical database for English nouns,
verbs, adjectives, and adverbs. Fine-grained sense
distinctions are recognized and organized into syn-
onym sets (‘synsets’), WordNet’s basic unit of anal-
ysis; each synset has a characterizing gloss, and
most are exemplified through one or more phrases
or sentences.
In addition to the synonymy relationship at the
heart of WordNet, other semantic relationships are
referenced, including, among others, antonymy, hy-
ponymy, troponymy, partonomy, entailment, and
cause-to. On the basis of these relationships, Fell-
baum (1998) noted that WordNet reflected the struc-
ture of frame semantics to a degree, but suggested
that its organization by part of speech would pre-
clude a full frame-semantic approach.
With release 2.0, WordNet added morphological
and topical category relationships that cross over
part-of-speech boundaries. This development relates
to incorporating a full frame-semantic approach in
WordNet in two ways.
First, since the lexical units that evoke a frame are
not restricted to a single part of speech, the ability
to create links between parts of speech is required in
order to encode frame semantic relationships.
Second, topical categories (e.g., slang, meat,
navy, Arthurian legend, celestial body, historical
linguistics, Mafia) have a kinship with semantic
frames, but are not the same. While topical cate-
gory domains map between categories and lexical
items—as do semantic frames—it is often not clear
what internal structure might be posited for a cate-
gory domain. What, for example, would the partici-
pant structure of ‘meat’ look like?
Should WordNet choose to adopt a full frame-
semantic approach, FrameNet and SemFrame
are natural starting points for identifying frame-
semantic relationships between synsets. The most
beneficial enhancement would involve WordNet’s
incorporating FrameNet and/or SemFrame frames
as a separate resource, with a mapping between
WordNet’s synsets and the semantic frame inven-
tory. SemFrame has the extra advantage that its lex-
ical units are already identified as WordNet synsets.
3 Development of SemFrame
There are two main processing stages in producing
SemFrame output: The first establishes verb classes,
while the second generates semantic frames. The
next two subsections describe these stages.
3.1 Establishing Verb Classes
SemFrame adopts a multistep approach to identify-
ing sets of frame-semantically related verb senses.
The basic steps involved in the current version5 of
SemFrame are:
1. Building a graph with WordNet verb synsets as
vertices and semantic relationships as edges
2. Identifying for each vertex a maximal highly
connected component (HCC) (i.e., a highly in-
terconnected subgraph that the vertex is part of)
3. Eliminating HCC’s with undesirable qualities
4. Forming preliminary verb semantic classes by
supplementing HCC’s with reliable semantic
relationships
5. Merging verb semantic classes with a high de-
gree of overlap
Building the Relationships Graph
WordNet 2.0 includes a vast array of semantic
relationships between synsets of the same part of
speech and has now been enhanced with relation-
ships linking synsets of different parts of speech.
Some of these relationships are almost guaranteed
to link synsets that evoke the same frame, while oth-
ers operate within the bounds of a semantic frame
on some occasions, but not others. Among the re-
lationship types in WordNet most fruitful for iden-
tifying verb synsets within the same frame seman-
tic verb class are: synonymy (e.g., buy, purchase,
as collocated within synsets), antonymy (e.g., buy,
5The process of establishing verb classes has been re-
designed. All that has been carried over from the previ-
ous/initial version of SemFrame is the use of some of the same
WordNet relationships. New in the current version are: the use
of relationship types first implemented in WordNet 2.0, the pre-
dominant and exclusive use of WordNet as the source of data
(the previous version used WordNet as a source secondary to
the Longman Dictionary of Contemporary English), and mod-
eling the identification of classes of related verbs as a graph,
specifically through the use of highly connected components.
59
sell), cause-to (e.g., transfer, change hands), en-
tailment (e.g., buy, pay), verb group (e.g., different
commercial senses of buy, morphological derivation
(e.g., buy, buyer),6 and “see also” (e.g., buy, buy
out). Instances of these relationship types for all
verb synsets in WordNet 2.0 are represented as edges
within the graph.
Additional edges are inserted between any two
synsets/vertices related by two or more of the fol-
lowing: clustering of synsets based on the occur-
rence of word stems in their glosses and example
sentences;7 hyperonymy/hyponymy relationships;
and category domain relationships. These three rela-
tionship types are too noisy to be used on their own
for identifying frame semantic relationships among
synsets, but when a relationship is verified by two or
more of these relationships, the likelihood that the
related synsets evoke the same frame is considerably
higher. Table 1 summarizes the number of edges in
the graph supported by each relationship type.
Relationship Type Count
Antonymy 502
Cause-to 218
Entailment 409
Verb group 874
Morphological derivation 8,986
See also 539
Two of: 2,223
Clustering 54,298
Hyperonymy/hyponymy 12,985
Category domain 18,482
Total 13,751
Table 1: Relationship Counts in WordNet 2.0
Identifying Highly Connected Components
(HCC’s)
Step 1 constructs a graph interconnecting thou-
sands of WordNet verb synsets. Identifying sets of
verb synsets likely to evoke the same semantic frame
requires identifying subgraphs with a high degree
of interconnectivity. Empirical investigation has
6SemFrame relates verb synsets with a morphological
derivation relationship to a common noun synset. This includes
verbs related to different members of the shared noun synset.
7Voorhees’ (1986) hierarchical agglomerative clustering al-
gorithm was implemented.
Figure 1: Relationships Subgraph with HCC
shown that “highly connected components” (Har-
tuv and Shamir, 2000)—induced subgraphs of size
k in which every vertex’s connectivity exceeds k2
vertices—identify such sets of verb synsets.8 For
example, in a 5-vertex highly connected component,
each vertex is related to at least 3 other vertices. Fig-
ure 1 shows a portion of the original graph in which
relationship arcs constituting an HCC are given as
solid lines, while those that fail the interconnectivity
threshold are given as dotted lines.
Given an undirected graph, the Hartuv-Shamir al-
gorithm for identifying HCC’s returns zero or more
non-overlapping subgraphs (including zero or more
singleton vertices). But it is inaccurate to assume
that verb synsets evoke only a single frame, as is
suggested by non-overlapping subgraphs.9 For this
reason, we have modified the Hartuv-Shamir algo-
rithm to identify a maximal HCC, if one exists, for
(i.e., that includes) each vertex of the graph. This
modification reduces the effort involved in identify-
ing any single HCC: Since the diameter of a HCC
is no greater than two, only those vertices who are
neighbors of the source vertex, or neighbors of those
neighbors, need to be examined.
8The algorithm for computing HCC’s first finds the mini-
mum cut for a (sub)graph. If the graph meets the highly con-
nected component criterion, the graph is returned, else the al-
gorithm is called recursively on each of the subgraphs created
by the cut. The Stoer-Wagner (1997) algorithm has been imple-
mented for finding the minimum cut.
9Semantic frames can be defined at varying levels of gen-
erality; thus, a given synset may evoke a set of hierarchically
related frames. Words/Synsets may also evoke multiple, unre-
lated frames simultaneously; criticize, for example, evokes both
a Judging frame and a Communication frame.
60
Eliminating Duplicates
Because HCC’s were generated for each vertex
in the relationships graph, considerable duplication
and overlap existed in the output. The output of
step 2 was cleaned up using three filters. First,
duplicate HCC’s were eliminated. Second, any
HCC wholly included within another HCC was
deleted.10 Third, any HCC based only on mor-
phological derivation relationships was deleted. In
SemFrame, all verb synsets morphologically derived
from the same noun synset were related to each
other. Thus all verb synsets derived from a common
noun synset are guaranteed to generate an HCC. If
only such relationships support an HCC, the likeli-
hood that all of the interrelated verb synsets evoke
the same semantic frame is much lower than if other
types of relationships also provide evidence for their
interrelationship.
Supplementing HCC’s
The HCC’s generated in step 2 that survived the
filters implemented in step 3 form the basis of verb
framesets, that is, sets of verb senses that evoke the
same semantic frame. Specifically, all the synsets
represented by vertices in a single HCC form a
frameset.
The connectivity threshold imposed by HCC’s
helps maintain reasonably high precision of the re-
sulting framesets, but is too strict for high recall.
Some types of relationships known to operate within
frame-semantic boundaries generally do not survive
the connectivity threshold cutoff. For example, for
frames of a certain level of generality, if a spe-
cific verb evokes that frame, it is also the case that
its antonym evokes the frame, as antonyms operate
against the backdrop of the same situational con-
text; that is, they share participant structure.11 How-
ever, since antonymy is (only) a lexical relationship
between two word senses, A and B, the tight cou-
pling of A and B is unlikely to be reflected in A’s
being directly related to other synsets that are re-
lated to B and vice-versa. Thus, antonyms are un-
10Given the interest in generating semantic frames of varying
levels of generality, this filter may itself be eliminated in the
future.
11Identifying antonyms is especially helpful in the case of
conversives, as with buy and sell; the inclusion of both in the
frameset promotes discovery of all relevant frame participants,
in this case, both buyer and seller.
likely to be highly connected through WordNet to
other words/synsets that evoke the frame and thus
fail the HCC connectivity threshold. The same ar-
gument can be made for causatively related verbs.
A post-processing step was required therefore to add
to a frameset any verb synsets related through Word-
Net’s antonymy or cause-to relationships to a mem-
ber of the frameset. Similarly, any verb synset en-
tailed by a member of a verb frameset was added to
the frameset.
Other verb synsets fail to survive the connectivity
threshold cutoff because they enter into few relation-
ships of any kind. If a verb synset is related to only
one other verb synset, the assumption is made that it
evokes the same frame as that one other synset; it is
then added to the corresponding frameset.
Lastly, if a synset is related to two or more mem-
bers of a frameset, the likelihood that it evokes the
same semantic frame is reasonably high. Such verb
synsets were added to the frameset if not already
present.
At the end of this phase, any framesets wholly in-
cluded within another frameset were again deleted.
Merging Overlapping Verb Classes
The preceding processes produced many frame-
sets with a significant degree of overlap. For any
two framesets, if at least half of the verb synsets in
both framesets were also members of the other, the
two framesets were merged into a single frameset.
Summary of Stage 1 Results
The above steps generated 1434 framesets, vary-
ing in size from 2 to 25 synsets (see Table 2). Small
framesets dominate the results, with over 60% of the
framesets including only 2 or 3 synsets.
Representative examples of these framesets are
given in Appendix A, where members of each synset
appear in parentheses, followed by the synset’s
gloss. (Examples are ordered by frameset size.)
Smaller and medium-sized framesets generally en-
joy high precision, but many of the largest framesets
would be better split into two or more framesets.
3.2 Generating Semantic Frames
Generating frames from verb framesets relies on
the insight that the semantic arguments of a frame
are largely drawn from nouns associated with verb
61
Frameset Size Count
2 536
3 346
4-5 309
6-8 169
9-12 54
13-25 20
Total 1434
Table 2: Count of Frameset Sizes
synsets in the frameset. In SemFrame’s processing,
these include nouns in the gloss of a verb synset or in
the gloss of its corresponding LDOCE verb sense(s),
as well as nouns (that is, noun synsets) to which
a verb synset is morphologically related and those
naming the category domain to which a verb synset
belongs. In the latter two cases, the nouns come dis-
ambiguated within WordNet, but nouns from glosses
must undergo disambiguation. The set of noun
senses associated with a verb frameset is then ana-
lyzed against the WordNet noun hierarchy, using an
adaptation of Agirre and Rigau’s (1995) conceptual
density measure. This analysis identifies a frame
name and a set of frame participants, all of which
correspond to nodes in the WordNet noun hierarchy.
Disambiguating Nouns from Glosses
First we consider how nouns from WordNet and
LDOCE verb glosses are disambiguated.12 This step
involves looking for matches between the stems of
words in the glosses of WordNet noun synsets that
include the noun needing to be disambiguated, on
the one hand, and the stems of words in the glosses
of all WordNet verb synsets (and corresponding
LDOCE verb senses) in the frameset, on the other
hand.
A similarity score is computed by dividing the
match count by the number of non-stop-word stems
in the senses under consideration. SemFrame favors
predominant senses by examining word senses in
frequency order. Any sense with a non-zero similar-
ity score that is the highest score yet seen is chosen
as an appropriate word sense.
The various nodes within WordNet’s noun net-
12Identification of LDOCE verb senses that correspond to
WordNet verb synsets is carried out using a similar strategy.
work that correspond to a verb frameset—either
through morphological derivation or category do-
main relationships in WordNet or through the dis-
ambiguation of nouns from the glosses of verbs in
the frameset—constitute ‘evidence synsets’ for the
participant structure of the corresponding semantic
frame and form the input for the conceptual density
calculation.
In preparation for use in calculating conceptual
density, evidence synsets are given weights that
take into account the source and basis of the dis-
ambiguation. In the current implementation, noun
synsets related to the frameset through morpho-
logical derivation or shared category domain are
given a weight of 4.0 (the nouns are guaranteed
to be related to the verbs, and disambiguation of
the nouns is built into the fact that relationships
are given between synsets); disambiguated noun
synsets coming from WordNet verb synsets receive
a weight of 2.0 (since the original framesets contain
WordNet synsets, and the disambiguation strategy is
fairly conservative); non-disambiguated nouns com-
ing from LDOCE verbs related to the frameset have
a weight of 0.5 (LDOCE verbs are a step removed
from the original framesets, and the nouns have
not been disambiguated); all other nouns receive a
weight of 1.0. The weight for non-disambiguated
nouns is ultimately distributed across the noun’s
senses, with higher proportions of the weight being
assigned to more frequent senses.
Computing Conceptual Density
The overall idea behind transforming the list of
evidence synsets into a list of participants involves
using the relationship structure of WordNet to iden-
tify an appropriately small set of concepts (i.e.,
synsets) within WordNet that account for (i.e., are
superordinate to) as many of the evidence synsets as
possible; such synsets will be referred to as ‘cover-
ing synsets’.
This task relies on the hypothesis that a frame’s
evidence synsets will not be randomly distributed
across WordNet, but will be clustered in various sub-
trees within the hierarchy. Intuitively, when evi-
dence synsets cluster together, the subtrees in which
they occur will be more dense than those subtrees
where few or no evidence synsets occur. It is hy-
pothesized that the WordNet subtrees with the high-
62
est density are the most likely to correspond to
frame slots. Thus, the task is to identify such clus-
ters/subtrees and then to designate the nodes at the
roots of the subtrees as covering synsets (subject to
certain constraints).
The conceptual density measure we have used has
been inspired by the measure of the same name in
Agirre and Rigau (1995). The conceptual density,
CD(n), of a node n is computed as follows:
CD(n) =
summationtext
iepsilon1descendantsn(wgti ∗ treesizei)
treesizen
Both frame names and frame slots are identified
on the basis of this conceptual density measure, with
the frame name being taken from the node with the
highest conceptual density from a specified group
of subnetworks within the WordNet noun network
(including abstractions, actions, events, phenomena,
psychological features, and states). Frame slots
are subject to a density threshold (based on mean
density and variance), an evidence-synset-support
threshold, and a constraint on the number of pos-
sible slots to be taken from specific subnetworks
within WordNet. Further details on the computation
and interpretation of conceptual density are given
in (Green and Dorr, 2004).
Frame names and frame structures for
the framesets in Appendix A are given
in Appendix B. The full set of Sem-
Frame’s frames (including ca. 30,000 lexical
unit/frame associations) is publicly available at:
http://www.cs.umd.edu/˜rgreen/semframe2.tar.gz.
The correspondence between frameset sizes and
the number of slots generated for the frame is worth
noting, since we have independent evidence about
the number of slots that should be generated. Frames
in FrameNet generally have from 1 to 5 slots (occa-
sionally more). Over 70% of SemFrame’s frames
contain from 1 to 5 frame slots. Of course, generat-
ing an appropriate number of frame slots is not the
same as generating the right frame slots, a determi-
nation that requires empirical investigation.
4 Evaluation
Three student judges evaluated SemFrame’s results,
with 200 frames each assessed by two judges, and
1234 frames each assessed by one judge.
In evaluating a frame, judges began by examining
the set of verb synsets deemed to evoke a common
frame and identified from among them the largest
subset of synsets they considered to evoke the same
frame. This frame—designated the ‘target frame’—
was simply a mental construct in the judge’s mind.
For only 9% of the frame judgments were the judges
unable to identify a target frame.
If a target frame was discerned, judges were
then asked to evaluate whether the WordNet verb
synsets and LDOCE verb senses listed by Sem-
Frame could be used to communicate about the
frame the judge had in mind. This evaluation step
applied to 6147 WordNet verb synsets and 7148
LDOCE verb senses; in the judges’ views, 78% of
the synsets and 68% of the verb senses evoke the
target frame.
Judges were asked how well the frame names
generated by SemFrame capture the overall target
frame. Some 53% of the names were perceived to be
satisfactory (good or excellent), with another 25%
of the names in the right hierarchy. Only 11% of the
names were deemed to be only mediocre and 9% to
be unrelated.
Judges were also asked how well the frame ele-
ment names generated by SemFrame named a par-
ticipant or attribute of the target frame. Here 46%
of the names were found satisfactory, with another
18% of the names consistent with a target frame par-
ticipant, but either too general or too narrow. An-
other 5% of the names were regarded as mediocre
and 30% as unrelated.
Lastly, judges were asked to look for corre-
spondences between target frames and FrameNet
frames. While only 17% of the target frames were
considered equivalent to a FrameNet frame, many
were judged to be hierarchically related; 51% of
the FrameNet frames were judged more general
than the corresponding SemFrame frame, while 8%
were judged more specific. This reflects the need
to combine some number of SemFrame frames.
For 23% of the SemFrame frames, even the best
FrameNet match was considered only mediocre.
These may represent viable frames not yet recog-
nized by FrameNet. Judges also found 3668 verbs
in SemFrame that could be appropriately listed for a
corresponding frame in FrameNet, but were not.
These results reveal SemFrame’s strengths in in-
63
ducing frames by enumerating sets of verbs that
evoke a shared frame and in naming such frames.
SemFrame’s ability to postulate names for the ele-
ments of a frame is less robust, although results in
this area are still noteworthy.
5 Conclusion and Future Work
SemFrame’s output can be used to enhance lexical-
semantic resources in various ways. For example,
WordNet has recently incorporated new relationship
types, some of which touch on frame semantic re-
lationships. But frame semantic relationships are as
yet only implicit in WordNet; not all morphological
derivation relationships, for example, operate within
a frame. Should WordNet choose to reflect frame
semantic relationships, SemFrame would provide a
useful point of departure, since the verb framesets,
frame names, and frame slots are all already ex-
pressed as WordNet synsets.
SemFrame can also add to FrameNet. The ex-
tensive human effort that has gone into FrameNet
is overwhelmingly evident in the quality of its
frame structures (and attendant annotations). Sem-
Frame is unlikely ever to compete with FrameNet
on this score. However, SemFrame has identi-
fied frames not recognized in FrameNet, e.g., Sem-
Frame’s SOILING frame. SemFrame has like-
wise identified lexical units appropriate to FrameNet
frames that have not yet been incorporated into
FrameNet, e.g., stick to, stick with, and abide by in
the COMPLIANCE / CONFORMITY frame. These
contributions would add as well to the semantic rep-
resentations in PropBank. Since identifying frames
and their evoking lexical units from scratch re-
quires more effort than assessing the general qual-
ity of proposed frames and lexical units—indeed,
since there is currently no other systematic way in
which to identify either a universal set of seman-
tic frames or the set of lexical items that evoke a
frame—SemFrame’s ability to propose new frames
and new evoking lexical units constitutes a major
contribution to the development of lexical-semantic
resources.
SemFrame’s current results might themselves be
enhanced by considering data from other parts of
speech. For instance, at present SemFrame bases
all its frames on verb framesets, but some FrameNet
frames list only adjectives as evoking lexical units.
At the same time, potentially more can be done in as-
sociating verb synsets with frames: Only one-third
of WordNet’s verb synsets are now included in Sem-
Frame’s output. Some of those not now included
evoke none of SemFrame’s current frames, but some
do and have not yet been recognized. Ways of estab-
lishing hierarchical and compositional relationships
among frames should also be investigated.
The above suggestions for enhancing SemFrame
notwithstanding, major progress in improving Sem-
Frame awaits incorporation of corpus data. Rely-
ing on data from lexical resources has contributed
to SemFrame’s precision, but the data sparseness
bottleneck that SemFrame faces is nonetheless real.
On the basis of the lexical resource data used, verb
synsets are related on average to only 5 nouns, many
of which closely reflect the participant structure of
the corresponding frame. However, it is not uncom-
mon for specific elements of the participant structure
to go unrepresented, and any nouns in the dataset
that are not particularly reflective of the participant
structure carry far too much weight amidst such a
paucity of data.
In contrast, the number of nouns that co-occur
with a verb in a corpus may be orders of magnitude
greater.13 But the nouns in a corpus are less likely
to reflect closely the participant structure of the cor-
responding frame; many more nouns are thus likely
to be needed. Furthermore, word sense disambigua-
tion will be required to assign to a frame only those
nouns corresponding to an appropriate sense of the
verb.14 We are optimistic, however, that the pres-
ence of additional corpus data will help fill in frame
element gaps arising from the sparseness of lexical
resource data and can also be used to help reduce the
impact of nouns from lexical resource data that are
not representative of a frame’s participant structure.
Coupled with subject-specific resources, the anal-
ysis of corpus data may then lead to the development
13We are investigating two levels of noun-verb co-
occurrence. The first counts co-occurrences of all nouns and
verbs appearing within the same paragraph of newswire texts.
The second counts only those nouns related to verbs as their
subjects, direct objects, indirect objects, or as objects of prepo-
sitional phrases that modify the verb.
14We make the simplifying assumption that if a noun occurs
with some reasonable percentage of the verbs within a frameset,
the desired verb sense is in play.
64
of subject-specific frame inventories. Such invento-
ries can in turn inform such knowledge-intensive ap-
plications as information retrieval, information ex-
traction, and question answering.
Acknowledgements
This work has been supported in part by NSF ITR
Grant IIS-0326553.

References
Eneko Agirre and German Rigau. A proposal for word
sense disambiguation using conceptual distance. 1st
International Conference on Recent Advances in NLP.
Collin Baker, Jan Hajic, Martha Palmer, and Manfred
Pinkal. 2004. Beyond syntax: Predicates, arguments,
valency frames and linguistic annotations. Tutorial at
42nd Annual Meeting of the Association of Computa-
tional Linguistics.
Christiane Fellbaum (Ed.) 1998. Introduction. In
C. Fellbaum (Ed.), WordNet: An Electronic Lexical
Database. MIT Press.
Daniel Gildea and Daniel Jurafsky. 2002. Automatic la-
beling of semantic roles. Computational Linguistics,
28(3): 245–288.
Rebecca Green, Bonnie J. Dorr, and Philip Resnik. 2004.
Inducing frame semantic verb classes from WordNet
and LDOCE. 42nd Annual Meeting of the Association
of Computational Linguistics.
Rebecca Green and Bonnie J. Dorr. 2004. Inducing a
semantic frame lexicon from WordNet data. Work-
shop on Text Meaning and Interpretation, 42nd Annual
Meeting of the Association of Computational Linguis-
tics.
Erez Hartuv and Ron Shamir. 2000. A clustering algo-
rithm based on graph connectivity. Information Pro-
cessing Letters, 76:175–181.
Paul Kingsbury, Martha Palmer, and Mitch Marcus.
2002. Adding Semantic Annotation to the Penn Tree-
bank. Proceedings of the Human Language Technol-
ogy Conference.
Jane Morris and Graeme Hirst. 1991. Lexical cohe-
sion computed by thesaural relations as an indicator
of the structure of text. Computational Linguistics,
18(1):21–48.
Paul Procter (Ed.) 1978. Longman Dictionary of Con-
temporary English. Longman Group Ltd.
Mechthild Stoer and Frank Wagner. 1997. A simple min-
cut algorithm. Journal of the ACM, 44(4):585–591.
Nicola Stokes. 2004. Applications of lexical cohe-
sion: Analysis in the topic detection and tracking do-
main. Ph.D. dissertation, National University of Ire-
land, Dublin.
Ellen Voorhees. 1986. Implementing agglomerative
hierarchic clustering algorithms for use in document
retrieval. Information Processing & Management,
22(6):465–476.
