Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, pages 38–47,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Automatically Distinguishing Literal and Figurative Usages
of Highly Polysemous Verbs
Afsaneh Fazly and Ryan North and Suzanne Stevenson
Department of Computer Science
University of Toronto
a0 afsaneh,ryan,suzanne
a1 @cs.toronto.edu
Abstract
We investigate the meaning extensions
of very frequent and highly polysemous
verbs, both in terms of their compositional
contribution to a light verb construction
(LVC), and the patterns of acceptability of
the resulting LVC. We develop composi-
tionality and acceptability measures that
draw on linguistic properties specific to
LVCs, and demonstrate that these statisti-
cal, corpus-based measures correlate well
with human judgments of each property.
1 Introduction
Due to a cognitive priority for concrete, easily visu-
alizable entities, abstract notions are often expressed
in terms of more familiar and concrete things and
situations (Newman, 1996; Nunberg et al., 1994).
This gives rise to a widespread use of metaphor
in language. In particular, certain verbs easily un-
dergo a process of metaphorization and meaning
extension (e.g., Pauwels, 2000; Newman and Rice,
2004). Many such verbs refer to states or acts that
are central to human experience (e.g., sit, put, give);
hence, they are often both highly polysemous and
highly frequent. An important class of verbs prone
to metaphorization are light verbs, on which we fo-
cus in this paper.
A light verb, such as give, take, or make, com-
bines with a wide range of complements from differ-
ent syntactic categories (including nouns, adjectives,
and prepositions) to form a new predicate called a
light verb construction (LVC). Examples of LVCs
include:
1. (a) Azin took a walk along the river.
(b) Sam gave a speech to a few students.
(c) Joan takes care of him when I am away.
(d) They made good on their promise to win.
(e) You should always take this into account.
The light verb component of an LVC is “seman-
tically bleached” to some degree; consequently, the
semantic content of an LVC is assumed to be de-
termined primarily by the complement (Butt, 2003).
Nevertheless, light verbs exhibit meaning variations
when combined with different complements. For ex-
ample, give in give (someone) a present has a literal
meaning, i.e., “transfer of possession” of a THING
to a RECIPIENT. In give a speech, give has a figura-
tive meaning: an abstract entity (a speech) is “trans-
ferred” to the audience, but no “possession” is in-
volved. In give a groan, the notion of transfer is
even further diminished.
Verbs exhibiting such meaning variations are
widespread in many languages. Hence, successful
NLP applications—especially those requiring some
degree of semantic interpretation—need to identify
and treat them appropriately. While figurative uses
of a light verb are indistinguishable on the surface
from a literal use, this distinction is essential to a
machine translation system, as Table 1 illustrates. It
is therefore important to determine automatic mech-
anisms for distinguishing literal and figurative uses
of light verbs.
Moreover, in their figurative usages, light verbs
tend to have similar patterns of cooccurrence with
semantically similar complements (e.g., Newman,
1996). Each similar group of complement nouns can
even be viewed as a possible meaning extension for
a light verb. For example, in give advice, give or-
ders, give a speech, etc., give contributes a notion of
38
Sentence in English Intermediate semantics Translation in French
Azin gave Sam a book. (e1/give Azin a donn´e un livre `a Sam.
:agent (a1/“Azin”) Azin gave a book to Sam.
:theme (b1/“book”)
:recepient (s1/“Sam”))
Azin gave the lasagna a try. (e2/give-a-try a0 try Azin a essay´e le lasagne.
:agent (a1/“Azin”) Azin tried the lasagna.
:theme (l1/“lasagna”))
Table 1: Sample sentences with literal and figurative usages of give.
“abstract transfer”, while in give a groan, give a cry,
give a moan, etc., give contributes a notion of “emis-
sion”. There is much debate on whether light verbs
have one highly abstract (underspecified) meaning,
further determined by the context, or a number of
identifiable (related) subsenses (Pustejovsky, 1995;
Newman, 1996). Under either view, it is important
to elucidate the relation between possible interpreta-
tions of a light verb and the sets of complements it
can occur with.
This study is an initial investigation of techniques
for the automatic discovery of meaning extensions
of light verbs in English. As alluded to above, we
focus on two issues: (i) the distinction of literal ver-
sus figurative usages, and (ii) the role of semanti-
cally similar classes of complements in refining the
figurative meanings.
In addressing the first task, we note the connection
between the literal/figurative distinction and the de-
gree to which a light verb contributes composition-
ally to the semantics of an expression. In Section 2,
we elaborate on the syntactic properties that relate
to the compositionality of light verbs, and propose
a statistical measure incorporating these properties,
which places light verb usages on a continuum of
meaning from literal to figurative. Figure 1(a) de-
picts such a continuum in the semantic space of give,
with the literal usages represented as the core.
The second issue above relates to our long-term
goal of dividing the space of figurative uses of a
light verb into semantically coherent segments, as
shown in Figure 1(b). Section 3 describes our hy-
pothesis on the class-based nature of the ability of
potential complements to combine with a light verb.
At this point we cannot spell out the different figura-
tive meanings of the light verb associated with such
classes. We take a preliminary step in proposing a
statistical measure of the acceptability of a combi-
nation of a light verb and a class of complements,
and explore the extent to which this measure can re-
veal class-based behaviour.
Subsequent sections of the paper present the cor-
pus extraction methods for estimating our composi-
tionality and acceptability measures, the collection
of human judgments to which the measures will be
compared, experimental results, and discussion.
2 Compositionality of Light Verbs
2.1 Linguistic Properties: Syntactic Flexibility
We focus on a broadly-documented subclass of light
verb constructions, in which the complement is an
activity noun that is often the main source of seman-
tic predication (Wierzbicka, 1982). Such comple-
ments are assumed to be indefinite, non-referential
predicative nominals (PNs) that are often morpho-
logically related to a verb (see the complements in
examples (1a–c) above). We refer to this class of
light verb constructions as “LV+PN” constructions,
or simply LVCs.
There is much linguistic evidence that semantic
properties of a lexical item determine, to a large ex-
tent, its syntactic behaviour (e.g., Rappaport Hovav
and Levin, 1998). In particular, the degree of com-
positionality (decomposability) of a multiword ex-
pression has been known to affect its participation
in syntactic transformations, i.e., its syntactic flexi-
bility (e.g., Nunberg et al., 1994). English “LV+PN”
constructions enforce certain restrictions on the syn-
tactic freedom of their noun components (Kearns,
2002). In some, the noun may be introduced by a
definite article, pluralized, passivized, relativized, or
even wh-questioned:
39
give a book
give a present
give money
give rightgive advice
give opportunity
give orders
give permission
give a speech
give a smile
give a laugh give a yell
give a groan
give a sweep
give a push
give a dust
give a wipe
give a pull
give a kick
more figurative give a bookgive a present
give money
give a wipe
give a sweep
give a dust
give a push
give a kick
give a pull
give orders
give a speech
give advice give permission
give right
give opportunity
give a yell
give a laugh
give a groan
give a smile
(a) (b)
Figure 1: Two possible partitionings of the semantic space of give.
2. (a) Azin gave a speech to a few students.
(b) Azin gave the speech just now.
(c) Azin gave a couple of speeches last night.
(d) A speech was given by Azin just now.
(e) Which speech did Azin give?
Others have little or no syntactic freedom:
3. (a) Azin gave a groan just now.
(b) * Azin gave the groan just now.
(c) ? Azin gave a couple of groans last night.
(d) * A groan was given by Azin just now.
(e) * Which groan did Azin give?
Recall that give in give a groan is presumed to be
a more abstract usage than give in give a speech. In
general, the degree to which the light verb retains
aspects of its literal meaning—and contributes them
compositionally to the LVC—is reflected in the de-
gree of syntactic freedom exhibited by the LVC. We
exploit this insight to devise a statistical measure of
compositionality, which uses evidence of syntactic
(in)flexibility of a potential LVC to situate it on a
scale of literal to figurative usage of the light verb:
i.e., the more inflexible the expression, the more fig-
urative (less compositional) the meaning.
2.2 A Statistical Measure of Compositionality
Our proposed measure quantifies the degree of syn-
tactic flexibility of a light verb usage by looking
at its frequency of occurrence in any of a set of
relevant syntactic patterns, such as those in exam-
ples (2) and (3). The measure, COMP
a0 LV
a1 Na2 , as-
signs a score to a given combination of a light verb
(LV) and a noun (N):
COMP
a0 LV
a1 Na2a4a3
ASSOC
a0 LV;N
a2a6a5
DIFF
a0 A
SSOC
a0 LV;N
a1 PSposa2a7a1 ASSOC
a0 LV;N
a1 PSnega2a8a2
That is, the greater the association between LV and
N, and the greater the difference between their asso-
ciation with positive syntactic patterns and negative
syntactic patterns, the more figurative the meaning
of the light verb, and the higher the score.
The strength of the association between the light
verb and the complement noun is measured using
pointwise mutual information (PMI) whose standard
formula is given here:1
ASSOC
a0 LV;N
a2a9a3 log
Pra0 LVa1 Na2
Pra0 LVa2 Pra0 Na2
a10 log
n f a0 LVa1 Na2
f a0 LVa2 f a0 Na2
where n is an estimate of the total number of verb
and object noun pairs in the corpus.
1PMI is subject to overestimation for low frequency items
(Dunning, 1993), thus we require a minimum frequency of oc-
currence for the expressions under study.
40
PSpos represents the set of syntactic patterns pre-
ferred by less-compositional (more figurative) LVCs
(e.g., as in (3a)), and PSneg represents less preferred
patterns (e.g., those in (3b–e)). Typically, these pat-
terns most affect the expression of the complement
noun. Thus, to measure the strength of association
between an expression and a set of patterns, we use
the PMI of the light verb, and the complement noun
appearing in all of the patterns in the set, as in:
ASSOC
a0 LV;N
a1 PSposa2 a3 PMI
a0 LV;N
a1 PSposa2
a3 log
Pra0 LVa1 Na1 PSposa2
Pra0 LVa2 Pra0 Na1 PSposa2
a10 log
n f a0 LVa1 Na1 PSposa2
f a0 LVa2 f a0 Na1 PSposa2
in which counts of occurrences of N in syntactic
contexts represented by PSpos are summed over all
patterns in the set. ASSOC(LV;Na1 PSneg) is defined
analogously using PSneg in place of PSpos.
DIFF measures the difference between the asso-
ciation strengths of the positive and negative pat-
tern sets, referred to as ASSOC pos and ASSOCneg,
respectively. Our calculation of ASSOC uses max-
imum likelihood estimates of the true probabilities.
To account for resulting errors, we compare the two
confidence intervals, a1 ASSOC pos
a2
∆ASSOC pos
a3
and
a1 A
SSOCneg
a2
∆ASSOCneg
a3
, as in Lin (1999). We take
the minimum distance between the two as a conser-
vative estimate of the true difference:
DIFF
a0 A
SSOC
a0 LV;N
a1 PSposa2a7a1 ASSOC
a0 LV;N
a1 PSnega2a8a2
a10
a0 A
SSOC pos
a4
∆ASSOCposa2
a4
a0 A
SSOCneg a5 ∆ASSOCnega2
Taking the difference between confidence intervals
lessens the effect of differences that are not statisti-
cally significant. (The confidence level, 1
a4
α, is set
to 95% in all experiments.)
3 Acceptability Across Semantic Classes
3.1 Linguistic Properties: Class Behaviour
In this aspect of our work, we narrow our focus onto
a subclass of “LV+PN” constructions that have a PN
complement in a stem form identical to a verb, pre-
ceded (typically) by an indefinite determiner (as in
(1a–b) above). Kearns (2002), Wierzbicka (1982),
and others have noted that the way in which LVs
combine with such PNs to form acceptable LVCs
is semantically patterned—that is, PNs with similar
semantics appear to have the same trends of cooc-
currence with an LV.
Our hypothesis is that semantically similar
LVCs—i.e., those formed from an LV plus any of
a set of semantically similar PNs—distinguish a fig-
urative subsense of the LV. In the long run, if this is
true, it could be exploited by using class information
to extend our knowledge of acceptable LVCs and
their likely meaning (cf. such an approach to verb
particle constructions by Villavicencio, 2003).
As steps to achieving this long-term goal, we must
first devise an acceptability measure which deter-
mines, for a given LV, which PNs it successfully
combines with. We can even use this measure to
provide evidence on whether the hypothesized class-
based behaviour holds, by seeing if the measure ex-
hibits differing behaviour across semantic classes of
potential complements.
3.2 A Statistical Measure of Acceptability
We develop a probability formula that captures the
likelihood of a given LV and PN forming an accept-
able LVC. The probability depends on both the LV
and the PN, and on these elements being used in an
LVC:
ACPT
a0 LV
a1 PNa2
a3 Pr a0 LVa1 PNa1 LVCa2
a3 Pr
a0 PN
a2 Pr
a0 LVC
a5 PNa2 Pr
a0 LV
a5 PNa1 LVCa2
The first factor, Pra0 PNa2 , reflects the linguistic
observation that higher frequency words are more
likely to be used as LVC complements (Wierzbicka,
1982). We estimate this factor by f a0 PNa2a7a6 n, where n
is the number of words in the corpus.
The probability that a given LV and PN form an
acceptable LVC further depends on how likely it is
that the PN combines with any light verbs to form an
LVC. The frequency with which a PN forms LVCs is
estimated as the number of times we observe it in the
prototypical “LV a/an PN” pattern across LVs. (Note
that such counts are an overestimate, since we can-
not determine which usages are indeed LVCs vs. lit-
eral uses of the LV.) Since these counts consider the
PN only in the context of an indefinite determiner,
41
we normalize over counts of “a/an PN” (noted as
aPN) to form the conditional probability estimate of
the second factor:
Pr a0 LVCa5 PNa2 a10
v
∑
ia0 1
f a0 LVia1 aPNa2
f a0 aPNa2
where v is the number of light verbs considered.
The third factor, Pra0 LV a5 PNa1 LVCa2 , reflects that
different LVs have varying degrees of acceptability
when used with a given PN in an LVC. We similarly
estimate this factor with counts of the given LV and
PN in the typical LVC pattern: f a0 LVa1 aPNa2a7a6 f a0 aPNa2 .
Combining the estimates of the three factors
yields:
ACPT
a0 LV
a1 PNa2
a10
f a0 PNa2
n
a1
v
∑
ia0 1
f a0 LVia1 aPNa2
f a0 aPNa2
a1
f a0 LVa1 aPNa2
f a0 aPNa2
4 Materials and Methods
4.1 Light Verbs
Common light verbs in English include give, take,
make, get, have, and do, among others. We focus
here on two of them, i.e., give and take, that are
frequently and productively used in light verb con-
structions, and are highly polysemous. The Word-
Net polysemy count (number of different senses) of
give and take are 44 and 42, respectively.
4.2 Experimental Expressions
Experimental expressions—i.e., potential LVCs us-
ing give and take—are drawn from two sources.
The development and test data used in experiments
of compositionality (bncD and bncT, respectively)
are randomly extracted from the BNC (BNC Ref-
erence Guide, 2000), yielding expressions cover-
ing a wide range of figurative usages of give and
take, with complements from different semantic cat-
egories. In contrast, in experiments that involve ac-
ceptability, we need figurative usages of “the same
type”, i.e., with semantically similar complement
nouns, to further examine our hypothesis on the
class-based behaviour of light verb combinations.
Since in these LVCs the complement is a predica-
tive noun in stem form identical to a verb, we form
development and test expressions by combining give
or take with verbs from selected semantic classes of
Levin (1993), taken from Stevenson et al. (2004).
4.3 Corpora
We gather estimates for our COMP measure from the
BNC, processed using the Collins parser (Collins,
1999) and TGrep2 (Rohde, 2004). Because some
LVCs can be rare in classical corpora, our ACPT es-
timates are drawn from the World Wide Web (the
subsection indexed by AltaVista). In our compari-
son of the two measures, we use web data for both,
using a simplified version of COMP. The high level
of noise on the web will influence the performance
of both measures, but COMP more severely, due to
its reliance on comparisons of syntactic patterns.
Web counts are based on an exact-phrase query to
AltaVista, with the number of pages containing the
search phrase recorded as its frequency.2 The size
of the corpus is estimated at 3.7 billion, the number
of hits returned in a search for the. These counts are
underestimates of the true frequencies, as a phrase
may appear more than once in a web page, but we
assume all counts to be similarly affected.
4.4 Extraction
Most required frequencies are simple counts of a
word or string of words, but the syntactic patterns
used in the compositionality measure present some
complexity. Recall that PSpos and PSneg are pattern
sets representing the syntactic contexts of interest.
Each pattern encodes several syntactic attributes: v,
the voice of the extracted expression (active or pas-
sive); d, the type of the determiner introducing N
(definite or indefinite); and n, the number of N (sin-
gular or plural). In our experiments, the set of pat-
terns associated with less-compositional use, PSpos,
consists of the single pattern with values active, in-
definite, and singular, for these attributes. PSneg con-
sists of all patterns with at least one of these at-
tributes having the alternative value.
While our counts on the BNC can use syntac-
tic mark-up, it is not feasible to collect counts on
the web for some of the pattern attributes, such as
voice. We develop two different variations of the
measure, one for BNC counts, and a simpler one for
2All searches were performed March 15–30, 2005.
42
give take
Human Ratings bncD bncT bncD bncT
‘low’ 20 10 36 19
‘medium’ 35 16 9 5
‘high’ 24 10 27 10
Total 79 36 72 34
Table 2: Distribution of development and test expressions with
respect to human compositionality ratings.
web counts. We thus subscript COMP with abbre-
viations standing for each attribute in the measure:
COMPvdn for a measure involving all three attributes
(used on BNC data), and COMPd for a measure in-
volving determiner type only (used on web data).
5 Human Judgments
5.1 Judgments of Compositionality
To determine how well our proposed measure
of compositionality captures the degree of lit-
eral/figurative use of a light verb, we compare its
scores to human judgments on compositionality.
Three judges (native speakers of English with suf-
ficient linguistic knowledge) answered yes/no ques-
tions related to the contribution of the literal mean-
ing of the light verb within each experimental ex-
pression. The combination of answers to these ques-
tions is transformed to numerical ratings, ranging
from 0 (fully non-compositional) to 4 (largely com-
positional). The three sets of ratings yield linearly
weighted Kappa values of .34 and .70 for give and
take, respectively. The ratings are averaged to form
a consensus set to be used for evaluation.3
The lists of rated expressions were biased toward
figurative usages of give and take. To achieve a spec-
trum of literal to figurative usages, we augment the
lists with literal expressions having an average rating
of 5 (fully compositional). Table 2 shows the distri-
bution of the experimental expressions across three
intervals of compositionality degree, ‘low’ (ratings
a0 1), ‘medium’ (1
a1 ratings a1 3), and ‘high’ (rat-
ings a2 3). Table 3 presents sample expressions with
different levels of compositionality ratings.
3We asked the judges to provide short paraphrases for each
expression, and only use those expressions for which the major-
ity of judges expressed the same sense.
Sample Expressions
Human Ratings give take
‘low’ give a squeeze take a shower
‘medium’ give help take a course
‘high’ give a dose take an amount
Table 3: Sample expressions with different levels of composi-
tionality ratings.
5.2 Judgments of Acceptability
Our acceptability measure is compared to the hu-
man judgments gathered by Stevenson et al. (2004).
Two expert native speakers of English rated the ac-
ceptability of each potential “LV+PN” construction
generated by combining give and take with candi-
date complements from the development and test
Levin classes. Ratings were from 1 (unacceptable)
to 5 (completely natural; this was capped at 4 for
test data), allowing for “in-between” ratings as well,
such as 2.5. On test data, the two sets of ratings
yielded linearly weighted Kappa values of .39 and
.72 for give and take, respectively. (Interestingly,
a similar agreement pattern is found in our human
compositionality judgments above.) The consensus
set of ratings was formed from an average of the two
sets of ratings, once disagreements of more than one
point were discussed.
6 Experimental Results
To evaluate our compositionality and acceptability
measures, we compare them to the relevant con-
sensus human ratings using the Spearman rank cor-
relation coefficient, rs. For simplicity, we report
the absolute value of rs for all experiments. Since
in most cases, correlations are statistically signifi-
cant (p a3a5a401), we omit p values; those rs values
for which p is marginal (i.e., a401 a0 p a0 a410) are
subscripted with an “m” in the tables. Correlation
scores in boldface are those that show an improve-
ment over the baseline, PMILVC.
The PMILVC measure is an informed baseline, since
it draws on properties of LVCs. Specifically, PMILVC
measures the strength of the association between a
light verb and a noun appearing in syntactic patterns
preferred by LVCs, i.e., PMILVCa3 PMI
a0 LV;N
a1 PSposa2 .
Assuming that an acceptable LVC forms a detectable
collocation, PMILVC can be interpreted as an informed
baseline for degree of acceptability. PMILVC can also
43
PMILVC COMPvdn
LV Data Set n rs rs
bncT 36 .62 .57
give bncDT 114 .68 .70
bncDT/a 79 .68 .75
bncT 34 .51 .59
take bncDT 106 .52 .61
bncDT/a 68 .63 .72
Table 4: Correlations (rs; n = # of items) between human com-
positionality ratings and COMP measure (counts from BNC).
be considered as a baseline for the degree of compo-
sitionality of an expression (with respect to the light
verb component), under the assumption that the less
compositional an expression, the more its compo-
nents appear as a fixed collocation.
6.1 Compositionality Results
Table 4 displays the correlation scores of the human
compositionality ratings with COMPvdn, our com-
positionality measure estimated with counts from
the BNC. Given the variety of light verb usages
in expressions used in the compositionality data,
we report correlations not only on test data (bncT),
but also on development and test data combined
(bncDT) to get more data points and hence more re-
liable correlation scores. Compared to the baseline,
COMPvdn has generally higher correlations with hu-
man ratings of compositionality.
There are two different types of expressions
among those used in compositionality experiments:
expressions with an indefinite determiner a (e.g.,
give a kick) and those without a determiner (e.g.,
give guidance). Despite shared properties, the two
types of expressions may differ with respect to syn-
tactic flexibility, due to differing semantic proper-
ties of the noun complements in the two cases. We
thus calculate correlation scores for expressions with
the indefinite determiner only, from both develop-
ment and test data (bncDT/a). We find that COMPvdn
has higher correlations (and larger improvements
over the baseline) on this subset of expressions.
(Note that there are comparable numbers of items
in bncDT and bncDT/a, and the correlation scores
are highly significant—very small p values—in both
cases.)
To explore the effect of using a larger but noisier
corpus, we compare the performance of COMPvdn
Levin class: 18.1,2 30.3 43.2
LV n=35 n=18 n=35
give % fair/good ratings 51 44 54
log of mean ACPT -6 -4 -5
take % fair/good ratings 23 28 3
log of mean ACPT -4 -3 -6
Table 5: Comparison of the proportion of human ratings consid-
ered “fair” or “good” in each class, and the log10 of the mean
ACPT score for that class.
with COMPd, the compositionality measure using
web data. The correlation scores for COMPd on
bncDT are .41 and .35, for give and take, respec-
tively, compared to a baseline (using web counts) of
.37 and .32. We find that COMPvdn has significantly
higher correlation scores (larger rs and much smaller
p values), as well as larger improvements over the
baseline. This is a confirmation that using more syn-
tactic information, from less noisy data, improves
the performance of our compositionality measure.4
6.2 Acceptability Results
We have two goals in assessing our ACPT measure:
one is to demonstrate that the measure is indeed in-
dicative of the level of acceptability of an LVC, and
the other is to explore whether it helps to indicate
class-based patterns of acceptability.
Regarding the latter, Stevenson et al. (2004) found
differing overall levels of (human) acceptability for
different Levin classes combined with give and take.
This indicates a strong influence of semantic simi-
larity on the possible LV and complement combina-
tions. Our ACPT measure also yields differing pat-
terns across the semantic classes. Table 5 shows,
for each light verb and test class, the proportion of
acceptable LVCs according to human ratings, and
the log of the mean ACPT score for that LV and
class combination. For take, the ACPT score gener-
ally reflects the difference in proportion of accepted
expressions according to the human ratings, while
for give, the measure is less consistent. (The three
development classes show the same pattern.) The
ACPT measure thus appears to reflect the differing
patterns of acceptability across the classes, at least
4Using the automatically parsed BNC as a source of less
noisy data improves performance. However, since these con-
structions may be infrequent with any particular complement,
we do not expect the use of cleaner but more plentiful text (such
as existing treebanks) to improve the performance any further.
44
Levin PMILVC ACPT
LV Class n rs rs
18.1,2 35 .39m .55
give 30.3 18 .38m .73
43.2 35 .30m .34m
18.1.2 35 .57 .61
take 30.3 18 .55 .64
43.2 35 .43 .47
Table 6: Correlations (rs; n = # of items) between acceptability
measures and consensus human ratings (counts from web).
Human PMILVC ACPT COMPd
Ratings LV n rs rs rs
accept. give 88 .31 .42 .40
(Levin) take 88 .58 .61 .56
compos. give 114 .37 .21m .41
(bncDT) take 106 .32 .30 .35
Table 7: Correlations (rs; n = # of items) between each measure
and each set of human ratings (counts from web).
for take.
To get a finer-grained notion of the degree to
which ACPT conforms with human ratings, we
present correlation scores between the two, in
Table 6. The results show that ACPT has higher
correlation scores than the baseline—substantially
higher in the case of give. The correlations for give
also vary more widely across the classes.
These results together indicate that the accept-
ability measure may be useful, and indeed taps into
some of the differing levels of acceptability across
the classes. However, we need to look more closely
at other linguistic properties which, if taken into ac-
count, may improve the consistency of the measure.
6.3 Comparing the Two Measures
Our two measures are intended for different pur-
poses, and indeed incorporate differing linguistic in-
formation about LVCs. However, we also noted that
PMILVC can be viewed as a baseline for both, indicat-
ing some underlying commonality. It is worth ex-
ploring whether each measure taps into the differ-
ent phenomena as intended. To do so, we correlate
COMP with the human ratings of acceptability, and
ACPT with the human ratings of compositionality,
as shown in Table 7. (The formulation of the ACPT
measure here is adapted for use with determiner-less
LVCs.) For comparability, both measures use counts
from the web. The results confirm that COMPd cor-
relates better than does ACPT with compositionality
ratings, while ACPT correlates best with acceptabil-
ity ratings.
7 Discussion and Concluding Remarks
Recently, there has been increasing awareness of the
need for appropriate handling of multiword expres-
sions (MWEs) in NLP tasks (Sag et al., 2002). Some
research has concentrated on the automatic acqui-
sition of semantic knowledge about certain classes
of MWEs, such as compound nouns or verb parti-
cle constructions (VPCs) (e.g., Lin, 1999; McCarthy
et al., 2003; Villavicencio, 2003). Previous research
on LVCs, on the other hand, has primarily focused
on their automatic extraction (e.g., Grefenstette and
Teufel 1995; Dras and Johnson 1996; Moir´on 2004;
though see Stevenson et al. 2004).
Like most previous studies that focus on seman-
tic properties of MWEs, we are interested in the is-
sue of compositionality. Our COMP measure aims to
identify a continuum along which a light verb con-
tributes to the semantics of an expression. In this
way, our work combines aspects of earlier work on
VPC semantics. McCarthy et al. (2003) determine a
continuum of compositionality of VPCs, but do not
distinguish the contribution of the individual compo-
nents. Bannard et al. (2003), on the other hand, look
at the separate contribution of the verb and particle,
but assume that a binary decision on the composi-
tionality of each is sufficient.
Previous studies determine compositionality by
looking at the degree of distributional similarity be-
tween an expression and its component words (e.g.,
McCarthy et al., 2003; Bannard et al., 2003; Bald-
win et al., 2003). Because light verbs are highly pol-
ysemous and frequently used in LVCs, such an ap-
proach is not appropriate for determining their con-
tribution to the semantics of an expression. We in-
stead examine the degree to which a light verb usage
is “similar” to the prototypical LVC, through a sta-
tistical comparison of its behaviour within different
syntactic patterns. Syntactic flexibility and semantic
compositionality are known to be strongly correlated
for many types of MWEs (Nunberg et al., 1994). We
thus intend to extend our approach to include other
polysemous verbs with metaphorical extensions.
Our compositionality measure correlates well
with the literal/figurative spectrum represented in
45
human judgments. We also aim to determine finer-
grained distinctions among the identified figurative
usages of a light verb, which appear to relate to the
semantic class of its complement. Semantic class
knowledge may enable us to elucidate the types of
relations between a light verb and its complement
such as those determined in the work of Wanner
(2004), but without the need for the manually la-
belled training data which his approach requires.
Villavicencio (2003) used class-based knowledge to
extend a VPC lexicon, but assumed that an unob-
served VPC is not acceptable. We instead believe
that more robust application of class-based knowl-
edge can be achieved with a better estimate of the
acceptability of various expressions.
Work indicating acceptability of MWEs is largely
limited to collocational analysis using PMI-based
measures (Lin, 1999; Stevenson et al., 2004). We
instead use a probability formula that enables flex-
ible integration of LVC-specific linguistic proper-
ties. Our ACPT measure yields good correlations
with human acceptability judgments; indeed, the av-
erage increase over the baseline is about twice as
high as that of the acceptability measure proposed
by Stevenson et al. (2004). Although ACPT also
somewhat reflects different patterns across seman-
tic classes, the results clearly indicate the need for
incorporating more knowledge into the measure to
capture class-based behaviour more consistently.
The work presented here is preliminary, but is the
first we are aware of to tie together the two issues of
compositionality and acceptability, and relate them
to the notion of class-based meaning extensions of
highly polysemous verbs. Our on-going work is fo-
cusing on the role of the noun component of LVCs,
to determine the compositional contribution of the
noun to the semantics of the expression, and the role
of noun classes in influencing the meaning exten-
sions of light verbs.

References
Baldwin, T., Bannard, C., Tanaka, T., and Wid-
dows, D. (2003). An empirical model of multi-
word expression decomposability. In Proceedings
of the ACL-SIGLEX Workshop on Multiword Ex-
pressions: Analysis, Acquisition and Treatment,
pages 89–96.
Bannard, C., Baldwin, T., and Lascarides, A. (2003).
A statistical approach to the semantics of verb-
particles. In Proceedings of the ACL-SIGLEX
Workshop on Multiword Expressions: Analysis,
Acquisition and Treatment, pages 65–72.
BNC Reference Guide (2000). Reference Guide for
the British National Corpus (World Edition), sec-
ond edition.
Butt, M. (2003). The light verb jungle. Workshop
on Multi-Verb Constructions.
Collins, M. (1999). Head-Driven Statistical Models
for Natural Language Parsing. PhD thesis, Uni-
versity of Pennsylvania.
Dras, M. and Johnson, M. (1996). Death and light-
ness: Using a demographic model to find support
verbs. In Proceedings of the Fifth International
Conference on the Cognitive Science of Natural
Language Processing.
Dunning, T. (1993). Accurate methods for the statis-
tics of surprise and coincidence. Computational
Linguistics, 19(1):61–74.
Grefenstette, G. and Teufel, S. (1995). Corpus-
based method for automatic identification of sup-
port verbs for nominalization. In Proceedings of
the 7th Meeting of the EACL.
Kearns, K. (2002). Light verbs in English.
manuscript.
Levin, B. (1993). English Verb Classes and Alterna-
tions: A Preliminary Investigation. The Univer-
sity of Chicago Press.
Lin, D. (1999). Automatic identification of non-
compositional phrases. In Proceedings of the 37th
Annual Meeting of the ACL, pages 317–324.
McCarthy, D., Keller, B., and Carroll, J. (2003).
Detecting a continuum of compositionality in
phrasal verbs. In Proceedings of the ACL-SIGLEX
Workshop on Multiword Expressions: Analysis,
Acquisition and Treatment.
Moir´on, M. B. V. (2004). Discarding noise in an au-
tomatically acquired lexicon of support verb con-
structions. In Proceedings of the 4th International
Conference on Language Resources and Evalua-
tion (LREC).
Newman, J. (1996). Give: A Cognitive Linguistic
Study. Mouton de Gruyter.
Newman, J. and Rice, S. (2004). Patterns of usage
for English SIT, STAND, and LIE: A cognitively
inspired exploration in corpus linguistics. Cogni-
tive Linguistics, 15(3):351–396.
Nunberg, G., Sag, I. A., and Wasow, T. (1994). Id-
ioms. Language, 70(3):491–538.
Pauwels, P. (2000). Put, Set, Lay and Place: A
Cognitive Linguistic Approach to Verbal Mean-
ing. LINCOM EUROPA.
Pustejovsky, J. (1995). The Generative Lexicon.
MIT Press.
Rappaport Hovav, M. and Levin, B. (1998). Build-
ing verb meanings. In Butt and Geuder, editors,
The Projection of Arguments: Lexical and Com-
putational Factors, pages 97–134. CSLI Publica-
tions.
Rohde, D. L. T. (2004). TGrep2 User Manual.
Sag, I. A., Baldwin, T., Bond, F., Copestake, A., and
Flickinger, D. (2002). Multiword expressions: A
pain in the neck for NLP. In Proceedings of the
3rd International Conference on Intelligent Text
Processing and Computational Linguistics (CI-
CLING’02), pages 1–15.
Stevenson, S., Fazly, A., and North, R. (2004). Sta-
tistical measures of the semi-productivity of light
verb constructions. In Proceedings of the ACL-04
Workshop on Multiword Expressions: Integrating
Processing, pages 1–8.
Villavicencio, A. (2003). Verb-particle construc-
tions and lexical resources. In Proceedings of
the ACL-SIGLEX Workshop on Multiword Ex-
pressions: Analysis, Acquisition and Treatment,
pages 57–64.
Wanner, L. (2004). Towards automatic fine-grained
semantic classification of verb-noun collocations.
Natural Language Engineering, 10(2):95–143.
Wierzbicka, A. (1982). Why can you Have a Drink
when you can’t *Have an Eat? Language,
58(4):753–799.
