Proceedings of Human Language Technology Conference and Conference on Empirical Methods in Natural Language
Processing (HLT/EMNLP), pages 899–906, Vancouver, October 2005. c©2005 Association for Computational Linguistics
Measuring the relative compositionality of verb-noun (V-N) collocations by
integrating features
Sriram Venkatapathy a0
Language Technologies Research Centre,
International Institute of Information
Technology - Hyderabad,
Hyderabad, India.
sriram@research.iiit.ac.in
Aravind K. Joshi
Department of Computer and
Information Science and Institute for
Research in Cognitive Science,
University of Pennsylvania,
Philadelphia, PA, USA.
joshi@linc.cis.upenn.edu
Abstract
Measuring the relative compositionality
of Multi-word Expressions (MWEs) is
crucial to Natural Language Processing.
Various collocation based measures have
been proposed to compute the relative
compositionality of MWEs. In this paper,
we define novel measures (both colloca-
tion based and context based measures) to
measure the relative compositionality of
MWEs of V-N type. We show that the
correlation of these features with the hu-
man ranking is much superior to the cor-
relation of the traditional features with the
human ranking. We then integrate the pro-
posed features and the traditional features
using a SVM based ranking function to
rank the collocations of V-N type based
on their relative compositionality. We
then show that the correlation between the
ranks computed by the SVM based rank-
ing function and human ranking is signif-
icantly better than the correlation between
ranking of individual features and human
ranking.
1 Introduction
The main goal of the work presented in this paper
is to examine the relative compositionality of col-
1Part of the work was done at Institute for Research in Cog-
nitive Science (IRCS), University of Pennsylvania, Philadel-
phia, PA 19104, USA, when he was visiting IRCS as a Visiting
Scholar, February to December, 2004.
locations of V-N type using a SVM based ranking
function. Measuring the relative compositionality of
V-N collocations is extremely helpful in applications
such as machine translation where the collocations
that are highly non-compositional can be handled in
a special way (Schuler and Joshi, 2004) (Hwang
and Sasaki, 2005).
Multi-word expressions (MWEs) are those whose
structure and meaning cannot be derived from their
component words, as they occur independently.
Examples include conjunctions like ‘as well as’
(meaning ‘including’), idioms like ‘kick the bucket’
(meaning ‘die’), phrasal verbs like ‘find out’ (mean-
ing ‘search’) and compounds like ‘village commu-
nity’. A typical natural language system assumes
each word to be a lexical unit, but this assumption
does not hold in case of MWEs (Becker, 1975)
(Fillmore, 2003). They have idiosyncratic interpre-
tations which cross word boundaries and hence are
a ‘pain in the neck’ (Sag et al., 2002). They account
for a large portion of the language used in day-to-
day interactions (Schuler and Joshi, 2004) and so,
handling them becomes an important task.
A large number of MWEs have a standard syn-
tactic structure but are non-compositional semanti-
cally. An example of such a subset is the class of
non-compositional verb-noun collocations (V-N col-
locations). The class of non-compositional V-N col-
locations is important because they are used very
frequently. These include verbal idioms (Nunberg
et al., 1994), support-verb constructions (Abeille,
1988), (Akimoto, 1989), among others. The ex-
pression ‘take place’ is a MWE whereas ‘take a gift’
is not a MWE.
899
It is well known that one cannot really make a
binary distinction between compositional and non-
compositional MWEs. They do not fall cleanly into
mutually exclusive classes, but populate the con-
tinuum between the two extremes (Bannard et al.,
2003). So, we rate the MWEs (V-N collocations in
this paper) on a scale from 1 to 6 where 6 denotes
a completely compositional expression, while 1 de-
notes a completely opaque expression.
Various statistical measures have been suggested
for ranking expressions based on their composition-
ality. Some of these are Frequency, Mutual Infor-
mation (Church and Hanks, 1989) , distributed fre-
quency of object (Tapanainen et al., 1998) and LSA
model (Baldwin et al., 2003) (Schutze, 1998). In
this paper, we define novel measures (both collo-
cation based and context based measures) to mea-
sure the relative compositionality of MWEs of V-N
type (see section 6 for details). Integrating these sta-
tistical measures should provide better evidence for
ranking the expressions. We use a SVM based rank-
ing function to integrate the features and rank the
V-N collocations according to their compositional-
ity. We then compare these ranks with the ranks
provided by the human judge. A similar compari-
son between the ranks according to Latent-Semantic
Analysis (LSA) based features and the ranks of hu-
man judges has been made by McCarthy, Keller and
Caroll (McCarthy et al., 2003) for verb-particle con-
structions. (See Section 3 for more details). Some
preliminary work on recognition of V-N collocations
was presented in (Venkatapathy and Joshi, 2004).
We show that the measures which we have defined
contribute greatly to measuring the relative compo-
sitionality of V-N collocations when compared to the
traditional features. We also show that the ranks as-
signed by the SVM based ranking function corre-
lated much better with the human judgement that the
ranks assigned by individual statistical measures.
This paper is organized in the following sections
(1) Basic Architecture, (2) Related work, (3) Data
used for the experiments, (4) Agreement between
the Judges, (5) Features, (6) SVM based ranking
function, (7) Experiments & Results, and (8) Con-
clusion.
2 Basic Architecture
Every V-N collocation is represented as a vector of
features which are composed largely of various sta-
tistical measures. The values of these features for
the V-N collocations are extracted from the British
National Corpus. For example, the V-N collocation
‘raise an eyebrow’ can be represented as
a0 Frequency = 271, Mutual Information = 8.43, Dis-
tributed frequency of object = 1456.29, etc.a1 . A
SVM based ranking function uses these features to
rank the V-N collocations based on their relative
compositionality. These ranks are then compared
with the human ranking.
3 Related Work
(Breidt, 1995) has evaluated the usefulness of the
Point-wise Mutual Information measure (as sug-
gested by (Church and Hanks, 1989)) for the ex-
traction of V-N collocations from German text cor-
pora. Several other measures like Log-Likelihood
(Dunning, 1993), Pearson’s a2a4a3 (Church et al.,
1991), Z-Score (Church et al., 1991) , Cubic As-
sociation Ratio (MI3), etc., have been also pro-
posed. These measures try to quantify the associ-
ation of two words but do not talk about quantify-
ing the non-compositionality of MWEs. Dekang Lin
proposes a way to automatically identify the non-
compositionality of MWEs (Lin, 1999). He sug-
gests that a possible way to separate compositional
phrases from non-compositional ones is to check the
existence and mutual-information values of phrases
obtained by replacing one of the words with a sim-
ilar word. According to Lin, a phrase is proba-
bly non-compositional if such substitutions are not
found in the collocations database or their mutual
information values are significantly different from
that of the phrase. Another way of determining the
non-compositionality of V-N collocations is by us-
ing ‘distributed frequency of object’ (DFO) in V-N
collocations (Tapanainen et al., 1998). The basic
idea in there is that “if an object appears only with
one verb (or few verbs) in a large corpus we expect
that it has an idiomatic nature” (Tapanainen et al.,
1998).
Schone and Jurafsky (Schone and Jurafsky, 2001)
applied Latent-Semantic Analysis (LSA) to the anal-
ysis of MWEs in the task of MWE discovery, by way
900
of rescoring MWEs extracted from the corpus. An
interesting way of quantifying the relative composi-
tionality of a MWE is proposed by Baldwin, Ban-
nard, Tanaka and Widdows (Baldwin et al., 2003).
They use LSA to determine the similarity between
an MWE and its constituent words, and claim that
higher similarity indicates great decomposability. In
terms of compositionality, an expression is likely
to be relatively more compositional if it is decom-
posable. They evaluate their model on English NN
compounds and verb-particles, and showed that the
model correlated moderately well with the Word-
net based decomposability theory (Baldwin et al.,
2003).
McCarthy, Keller and Caroll (McCarthy et al.,
2003) judge compositionality according to the de-
gree of overlap in the set of most similar words to
the verb-particle and head verb. They showed that
the correlation between their measures and the hu-
man ranking was better than the correlation between
the statistical features and the human ranking. We
have done similar experiments in this paper where
we compare the correlation value of the ranks pro-
vided by the SVM based ranking function with the
ranks of the individual features for the V-N collo-
cations. We show that the ranks given by the SVM
based ranking function which integrates all the fea-
tures provides a significantly better correlation than
the individual features.
4 Data used for the experiments
The data used for the experiments is British Na-
tional Corpus of 81 million words. The corpus is
parsed using Bikel’s parser (Bikel, 2004) and the
Verb-Object Collocations are extracted. There are
4,775,697 V-N collocations of which 1.2 million are
unique. All the V-N collocations above the fre-
quency of 100 (n=4405) are taken to conduct the ex-
periments so that the evaluation of the system is fea-
sible. These 4405 V-N collocations were searched in
Wordnet, American Heritage Dictionary and SAID
dictionary (LDC,2003). Around 400 were found in
at least one of the dictionaries. Another 400 were
extracted from the rest so that the evaluation set has
roughly equal number of compositional and non-
compositional expressions. These 800 expressions
were annotated with a rating from 1 to 6 by us-
ing guidelines independently developed by the au-
thors. 1 denotes the expressions which are totally
non-compositional while 6 denotes the expressions
which are totally compositional. The brief expla-
nation of the various ratings is as follows: (1) No
word in the expression has any relation to the ac-
tual meaning of the expression. Example : “leave a
mark”. (2) Can be replaced by a single verb. Ex-
ample : “take a look”. (3) Although meanings of
both words are involved, at least one of the words
is not used in the usual sense. Example : “break
news”. (4) Relatively more compositional than (3).
Example : “prove a point”. (5) Relatively less com-
positional than (6). Example : “feel safe”. (6) Com-
pletely compositional. Example : “drink coffee”.
5 Agreement between the Judges
The data was annotated by two fluent speakers of
English. For 765 collocations out of 800, both the
annotators gave a rating. For the rest, at least one
of the annotators marked the collocations as “don’t
know”. Table 1 illustrates the details of the annota-
tions provided by the two judges.
Ratings 6 5 4 3 2 1
Annotator1 141 122 127 119 161 95
Annotator2 303 88 79 101 118 76
Table 1: Details of the annotations of the two anno-
tators
From the table 1 we see that annotator1 dis-
tributed the rating more uniformly among all the
collocations while annotator2 observed that a sig-
nificant proportion of the collocations were com-
pletely compositional. To measure the agreement
between the two annotators, we used the Kendall’s
TAU (a0 ) (Siegel and Castellan, 1988). a0 is the cor-
relation between the rankings1 of collocations given
by the two annotators. a0 ranges between 0 (little
agreement) and 1 (full agreement). a0 is defined as,
a1a3a2
a4a6a5a8a7a10a9a12a11a14a13a10a15a17a16a8a18
a5a20a19
a18
a9a22a21
a11a14a13a10a15a17a16a8a23
a5a24a19
a23
a9a25a21
a26
a16a28a27a30a29
a19
a27a14a31
a21
a16a28a27a32a29
a19
a27a30a33
a21
a27a14a31
a2
a4a35a34
a5
a16a8a34
a5a20a19a37a36a38a21
a39 a40
a27a30a33
a2
a4a42a41
a5
a16a8a41
a5a20a19a43a36a38a21
a39 a40
a27a32a29
a2
a15a44a16a8a15
a19a37a36a38a21
a39
1computed from the ratings
901
where a0a2a1 ’s are the rankings of annotator1 and a3a4a1 ’s
are the rankings of annotator2, n is the number of
collocations, a5a6a1 is the number of values in the a7a9a8
a10
group of tied a0 values and a11 a1 is the number of values
in the a7 a8 a10 group of tied a3 values.
We obtained a a0 score of 0.61 which is highly sig-
nificant. This shows that the annotators were in a
good agreement with each other in deciding the rat-
ing to be given to the collocations. We also com-
pare the ranking of the two annotators using Pear-
son’s Rank-Correlation coefficient (a12a14a13 ) (Siegel and
Castellan, 1988). We obtained a a12a14a13 score of 0.71 in-
dicating a good agreement between the annotators.
A couple of examples where the annotators differed
are (1) “perform a task” was rated 3 by annotator1
while it was rated 6 by annotator2 and (2) “pay trib-
ute” was rated 1 by annotator1 while it was rated 4
by annotator2.
The 765 samples annotated by both the annotators
were then divided into a training set and a testing set
in several possible ways to cross-validate the results
of ranking (section 8).
6 Features
Each collocation is represented by a vector whose
dimensions are the statistical features obtained from
the British National Corpus. The features used in
our experiments can be classified as (1) Collocation
based features and (2) Context based features.
6.1 Collocation based features
Collocation based features consider the entire collo-
cation as an unit and compute the statistical proper-
ties associated with it. The collocation based fea-
tures that we considered in our experiments are (1)
Frequency, (2) Point-wise Mutual Information, (3)
Least mutual information difference with similar
collocations, (4) Distributed frequency of object and
(5) Distributed frequency of object using the verb
information.
6.1.1 Frequency (a15 )
This feature denotes the frequency of a colloca-
tion in the British National Corpus. Cohesive ex-
pressions have a high frequency. Hence, greater the
frequency, the more is the likelihood of the expres-
sion to be a MWE.
6.1.2 Point-wise Mutual Information (a16 )
Point-wise Mutual information of a collocation
(Church and Hanks, 1989) is defined as,
a16a18a17a19a11a21a20a23a22a25a24a27a26
a15a28a17a19a11a2a20a23a22a25a24a30a29a31a15a28a17a33a32a34a20a35a32a36a24
a15a28a17a19a11a2a20a35a32a36a24a30a29a37a15a28a17a33a32a34a20a23a22a25a24
where, a11 is the verb and a22 is the object of the col-
location. The higher the Mutual information of a
collocation, the more is the likelihood of the expres-
sion to be a MWE.
6.1.3 Least mutual information difference with
similar collocations (a38 )
This feature is based on Lin’s work (Lin, 1999).
He suggests that a possible way to separate compo-
sitional phrases from non-compositional ones is to
check the existence and mutual information values
of similar collocations (phrases obtained by replac-
ing one of the words with a similar word). For exam-
ple, ‘eat apple’ is a similar collocation of ‘eat pear’.
For a collocation, we find the similar collocations
by substituting the verb and the object with their
similar words2. The similar collocation having the
least mutual information difference is chosen and
the difference in their mutual information values is
noted.
If a collocation a39 has a set of similar collocations
a40 , then we define
a38 as
a38a41a17a19a11a14a42a43a20a23a22a44a42a45a24a28a26a46a16a47a7a49a48 a13a23a50a44a51
a0a53a52a55a54a43a56
a17a19a16a18a17a57a39a35a24a58a32a59a16a18a17
a56
a24a60a24 a1
where a52a61a54a43a56 a17a19a0a62a24 returns the absolute value of a0 and
a11 a42 and a22 a42 are the verb and object of the collocation a39
respectively. If similar collocations do not exist for a
collocation, then this feature is assigned the highest
among the values assigned in the previous equation.
In this case, a38 is defined as,
a38a41a17a19a11a2a20a23a22a25a24a41a26a46a16
a52
a0a21a1a19a63a64
a0
a38a41a17a19a11a44a1a33a20a23a22a65a64a53a24 a1
where a11 and a22 are the verb and object of colloca-
tions for which similar collocations do not exist. The
higher the value of a38 , the more is the likelihood of
the collocation to be a MWE.
2obtained from Lin’s (Lin, 1998) automatically generated
thesaurus (http://www.cs.ualberta.ca/a66 lindek/downloads.htm).
We obtained the best results (section 8) when we substituted
top-5 similar words for both the verb and the object. To mea-
sure the compositionality, semantically similar words are more
suitable than synomys. Hence, we choose to use Lin’s the-
saurus (Lin, 1998) instead of Wordnet (Miller et al., 1990).
902
6.1.4 Distributed Frequency of Object (a0 )
The distributed frequency of object is based on the
idea that “if an object appears only with one verb
(or few verbs) in a large corpus, the collocation is
expected to have idiomatic nature” (Tapanainen et
al., 1998). For example, ‘sure’ in ‘make sure’ occurs
with very few verbs. Hence, ‘sure’ as an object is
likely to give a special sense to the collocation as it
cannot be used with any verb in general. It is defined
as,
a0a2a17a57a22a25a24a27a26
a4a2a1
a1
a15a28a17a19a11 a1 a20a23a22a14a24
a48
where a48 is the number of verbs occurring with the
object (a22 ), a11 a1 ’s are the verbs cooccuring with a22 and
a15a28a17a19a11a44a1 a20a23a22a25a24a4a3a6a5 . As the number of verbs (a48 ) increases,
the value of a0a6a17a57a22a14a24 decreases. Here, a5 is a threshold
which can be set based on the corpus. This feature
treats ‘point finger’ and ‘polish finger’ in the same
way as it does not use the information specific to the
verb in the collocation. Here, both the collocations
will have the value a0a6a17a8a7a23a15a6a7a49a48a10a9a12a11a65a12a14a13 a13 a24 . The 3 collocations
having the highest value of this feature are (1) come
true, (2) become difficult and (3) make sure.
6.1.5 Distributed Frequency of Object using
the Verb information (a15 )
Here, we have introduced an extension to the fea-
ture a0 such that the collocations like ‘point finger’
and ‘polish finger’ are treated differently and more
appropriately. This feature is based on the idea that
“a collocation is likely to be idiomatic in nature if
there are only few other collocations with the same
object and dissimilar verbs”. We define this feature
as,
a15 a17a19a11a21a20a23a22a25a24a41a26
a4a16a1
a1
a15a28a17a19a11 a1 a20a23a22a14a24 a29a16a0a25a7
a56a18a17
a17a19a11a21a20a60a11 a1 a24
a48
where a48a20a19a22a21a24a23 is the number of verbs occurring
with a22 , a11a14a1 ’s are the verbs cooccuring with a22 and
a15a28a17a19a11a44a1 a20a23a22a25a24a2a3a25a5 . a0a25a7
a56a26a17
a17a19a11a21a20a60a11a44a1 a24 is the distance between
the verb a11 and a11a44a1 . It is calculated using the wordnet
similarity measure defined by Hirst and Onge (Hirst
and St-Onge, 1998). In our experiments, we consid-
ered top-50 verbs which co-occurred with the object
a22 . We used a Perl package Wordnet::Similarity by
Patwardhan3 to conduct our experiments.
3http://www.d.umn.edu/
a66 tpederse/similarity.html
6.2 Context based features
Context based measures use the context of a
word/collocation to measure their properties. We
represented the context of a word/collocation using
a LSA model. LSA is a method of representing
words/collocations as points in vector space.
The LSA model we built is similar to that de-
scribed in (Schutze, 1998) and (Baldwin et al.,
2003). First, 1000 most frequent content words (i.e.,
not in the stop-list) were chosen as “content-bearing
words”. Using these content-bearing words as col-
umn labels, the 50,000 most frequent terms in the
corpus were assigned row vectors by counting the
number of times they occurred within the same sen-
tence as content-bearing words. Principal compo-
nent analysis was used to determine the principal
axis and we get the transformation matrix a27 a0a29a28a30a28a30a28a32a31 a0a29a28a30a28
which can be used to reduce the dimensions of the
1000 dimensional vectors to 100 dimensions.
We will now describe in Sections 6.2.1 and 6.2.2
the features defined using LSA model.
6.2.1 Dissimilarity of the collocation with its
constituent verb using the LSA model (a33 )
If a collocation is highly dissimilar to its con-
stituent verb, it implies that the usage of the verb in
the specific collocation is not in a general sense. For
example, the sense of ‘change’ in ‘change hands’
would be very different from its usual sense. Hence,
the greater the dissimilarity between the collocation
and its constituent verb, the more is the likelihood
that it is a MWE. The feature is defined as
a33a23a17a57a39 a20a60a11 a42 a24a27a26a35a34 a32
a56
a7a49a16a37a36a61a51a14a38a31a17a57a39a53a20a60a11 a42 a24
a56
a7 a16a37a36a55a51a39a38a37a17a57a39 a20a60a11 a42 a24a28a26
a40 a56a53a52
a17a57a39a65a24a42a41
a40 a56 a52
a17a19a11 a42 a24
a43 a40 a56 a52
a17a57a39a35a24
a43
a29
a43 a40 a56a53a52
a17a19a11 a42 a24
a43
where, a39 is the collocation, a11 a42 is the verb of the
collocation and lsa(a0 ) is representation of a0 using
the LSA model.
6.2.2 Similarity of the collocation to the
verb-form of the object using the LSA
model (a44 )
If a collocation is highly similar to the verb form
of an object, it implies that the verb in the collo-
cation does not contribute much to the meaning of
the collocation. The verb either acts as a sort of
903
support verb, providing perhaps some additional as-
pectual meaning. For example, the verb ‘give’ in
‘give a smile’ acts merely as a support verb. Here,
the collocation ‘give a smile’ means the same as the
verb-form of the object i.e., ‘to smile’. Hence, the
greater is the similarity between the collocation and
the verb-form of the object, the more is the likeli-
hood that it is a MWE. This feature is defined as
a44 a17a57a39 a20a60a11a4a22 a42 a24a41a26
a40 a56a53a52
a17a57a39a35a24 a41
a40 a56a53a52
a17a19a11a55a22a44a42a45a24
a43 a40 a56 a52
a17a57a39a35a24
a43
a29
a43 a40 a56 a52
a17a19a11a4a22 a42 a24
a43
where, a39 is the collocation and a11a4a22 a42 is the verb-form
of the object a22 a42 . We obtained the verb-form of the
object from the wordnet (Miller et al., 1990) us-
ing its ‘Derived forms’. If the object doesn’t have a
verbal form, the value of this feature is 0. Table 2
contains the top-6 collocations according to this fea-
ture. All the collocations in Table 2 (except ‘receive
award’ which does not mean the same as ‘to award’)
are good examples of MWEs.
Collocation Value Collocation Value
pay visit 0.94 provide assistance 0.92
provide support 0.93 give smile 0.92
receive award 0.92 find solution 0.92
Table 2: Top-6 collocations according to this feature
7 SVM based ranking function/algorithm
The optimal rankings on the training data is com-
puted using the average ratings of the two users.
The goal of the learning function is to model itself
according to this rankings. It should take a rank-
ing function a15 from a family of ranking functions a1
that maximizes the empirical a0 (Kendall’s Tau). a0
expresses the similarity between the optimal rank-
ing (a12a3a2 ) and the ranking (a12a5a4 ) computed by the func-
tion a15 . SVM-Light4 is a tool developed by Joachims
(Joachims, 2002) which provides us such a function.
We briefly describe the algorithm in this section.
Maximizing a0 is equivalent to minimizing the
number of discordant pairs (the pairs of collocations
which are not in the same order as in the optimal
ranking). This is equivalent to finding the weight
4http://svmlight.joachims.org
vector a6a7 so that the maximum number of inequali-
ties are fulfilled.
a8
a17a57a39 a1a33a20a23a39a60a64a53a24a10a9a47a12 a2a12a11 a6
a7a14a13
a17a57a39 a1a49a24 a3a15a6
a7a14a13
a17a57a39a33a64 a24
where a39a43a1 and a39a60a64 are the collocations, a17a57a39a65a1a33a20a23a39a60a64a53a24a16a9 a12a3a2
if the collocation a39 a1 is ranked higher than a39 a64 for the
optimal ranking a12 a2 , a13 a17a57a39 a1 a24 and a13 a17a57a39a60a64 a24 are the mapping
onto features (section 6) that represent the properties
of the V-N collocations a39a65a1 and a39a60a64 respectively and a6a7
is the weight vector representing the ranking func-
tion a15a5a17 .
Adding SVM regularization for margin maxi-
mization to the objective leads to the following opti-
mization problem (Joachims, 2002).
a16a47a7a49a48a62a7a49a16a47a7a19a18a39a11 a11a14a20 a17a21a6
a7
a20a22a6
a23
a24a27a26
a34
a24 a6
a7
a6
a7a26a25a28a27a30a29 a23
a1a19a63a64
a39a43a22 a48
a56a26a17
a12 a11 a6
a7
a17
a13
a17a57a39 a1 a24 a32
a13
a17a57a39a33a64 a24a60a24a32a31 a34 a32
a23
a1a19a63a64a44a20
a8
a1a19a63a64
a23
a1 a63a64a33a31 a23
where a23 a1 a63a64 are the (non-negative) slack variables
and C is the margin that allows trading-off margin
size against training error. This optimization prob-
lem is equivalent to that of a classification SVM on
pairwise difference vectors a13 a17a57a39 a1 a24 - a13 a17a57a39 a64 a24 . Due to
similarity, it can be solved using decomposition al-
gorithms similar to those used for SVM classifica-
tion (Joachims, 1999).
Using the learnt function a15a35a34a17a37a36 ( a6
a7
a2 is the learnt
weight vector), the collocations in the test set can be
ranked by computing their values using the formula
below.
a12a14a39a44a17a57a39 a1 a24a27a26a38a6
a7
a2
a13
a17a57a39 a1 a24
8 Experiments and Results
For training, we used 10% of the data and for test-
ing, we use 90% of the data as the goal is to use only
a small portion of the data for training (Data was di-
vided in 10 different ways for cross-validation. The
results presented here are the average results).
All the statistical measures show that the expres-
sions ranked higher according to their decreasing
values are more likely to be non-compositional. We
compare these ranks with the human rankings (ob-
tained using the average ratings of the users). To
compare, we use Pearson’s Rank-Order Correlation
Coefficient (a12 a13 ) (Siegel and Castellan, 1988).
We integrate all the seven features using the SVM
based ranking function (described in section 7). We
904
see that the correlation between the relative compo-
sitionality of the V-N collocations computed by the
SVM based ranking function is significantly higher
than the correlation between the individual features
and the human ranking (Table 3).
Feature Correlation Feature Correlation
a0 (f1) 0.129
a1 (f5) 0.203
a2 (f2) 0.117
a3 (f6) 0.139
a4 (f3) 0.210
a5 (f7) 0.300
a6 (f4) 0.111 Ranking a0 0.448
Table 3: The correlation values of the ranking of
individual features and the ranking of SVM based
ranking function with the ranking of human judge-
ments
In table 3, we also see that the contextual feature
which we proposed, ‘Similarity of the collocation to
the verb-form of the object’ (a44 ), correlated signifi-
cantly higher than the other features which indicates
that it is a good measure to represent the semantic
compositionality of V-N expressions. Other expres-
sions which were good indicators when compared
to the traditional features are ‘Least mutual infor-
mation difference with similar collocations’ (a38 ) and
‘Distributed frequency of object using the verb in-
formation’ (a15 ).
 0
 0.05
 0.1
 0.15
 0.2
 0.25
 0.3
 0.35
 0.4
 0.45
 0.5
 0  1  2  3  4  5  6  7  8
Correlation
Number of features
Correlation values when features are integrated
f1 f2
f6
f7
f3
f4
All
Order1
Order2
Figure 1: The change in a12a44a13 as more features are
added to the ranking function
To observe the contribution of the features to the
SVM based ranking function, we integrate the fea-
tures (section 6) one after another (in two different
ways) and compute the relative order of the collo-
cations according to their compositionality. We see
that as we integrate more number of relevant com-
positionality based features, the relative order corre-
lates better (better a12 a13 value) with the human ranking
(Figure 1). We also see that when the feature ‘Least
mutual information difference with similar colloca-
tions’ is added to the SVM based ranking function,
there is a high rise in the correlation value indicat-
ing it’s relevance. In figure 1, we also observe that
the context-based features did not contribute much
to the SVM based ranking function even though they
performed well individually.
9 Conclusion
In this paper, we proposed some collocation based
and contextual features to measure the relative com-
positionality of MWEs of V-N type. We then inte-
grate the proposed features and the traditional fea-
tures using a SVM based ranking function to rank
the V-N collocations based on their relative compo-
sitionality. Our main results are as follows, (1) The
properties ‘Similarity of the collocation to the verb-
form of the object’, ‘ Least mutual information dif-
ference with similar collocations’ and ‘Distributed
frequency of object using the verb information’ con-
tribute greatly to measuring the relative composi-
tionality of V-N collocations. (2) The correlation be-
tween the ranks computed by the SVM based rank-
ing function and the human ranking is significantly
better than the correlation between ranking of indi-
vidual features and human ranking.
In future, we will evaluate the effectiveness of the
techniques developed in this paper for applications
like Machine Translation. We will also extend our
approach to other types of MWEs and to the MWEs
of other languages (work on Hindi is in progress).
Acknowledgments
We want to thank the anonymous reviewers for
their extremely useful reviews. We are grateful to
Roderick Saxey and Pranesh Bhargava for annotat-
ing the data which we used in our experiments.
References
Anne Abeille. 1988. Light verb constructions and ex-
traction out of np in a tree adjoining grammar. In Pa-
905
pers of the 24th Regional Meeting of the Chicago Lin-
guistics Society.
Monoji Akimoto. 1989. Papers of the 24th regional
meeting of the chicago linguistics society. In Shi-
nozaki Shorin.
Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and
Dominic Widdows. 2003. An empirical model of
multiword expression. In Proceedings of the ACL-
2003 Workshop on Multiword Expressions: Analysis,
Acquisition and Treatment.
Colin Bannard, Timothy Baldwin, and Alex Lascarides.
2003. A statistical approach to the semantics of verb-
particles. In Proceedings of the ACL-2003 Workshop
on Multiword Expressions: Analysis, Acquisition and
Treatment.
Joseph D. Becker. 1975. The phrasal lexicon. In The-
oritical Issues of NLP, Workshop in CL, Linguistics,
Psychology and AI, Cambridge, MA.
Daniel M. Bikel. 2004. A distributional analysis of a
lexicalized statistical parsing model. In Proceedings
of EMNLP.
Elisabeth Breidt. 1995. Extraction of v-n-collocations
from text corpora: A feasibility study for german. In
CoRR-1996.
K. Church and Patrick Hanks. 1989. Word association
norms, mutual information, and lexicography. In Pro-
ceedings of the 27th. Annual Meeting of the Associa-
tion for Computational Linguistics, 1990.
K. Church, W. Gale, P. Hanks, and D. Hindle. 1991.
Parsing, word associations and typical predicate-
argument relations. In Current Issues in Parsing Tech-
nology. Kluwer Academic, Dordrecht, Netherlands,
1991.
Ted Dunning. 1993. Accurate methods for the statistics
of surprise and coincidence. In Computational Lin-
guistics - 1993.
Charles Fillmore. 2003. An extremist approach to multi-
word expressions. In A talk given at IRCS, University
of Pennsylvania, 2003.
G. Hirst and D. St-Onge. 1998. Lexical chains as repre-
sentations of context for the detection and correction
of malapropisms. In Fellbaum C., ed., Wordnet: An
electronic lexical database. MIT Press.
Young-Sook Hwang and Yutaka Sasaki. 2005. Context-
dependent SMT model using bilingual verb-noun col-
location. In Proceedings of the 43rd Annual Meet-
ing of the Association for Computational Linguistics
(ACL’05).
T. Joachims. 1999. Making large-scale svm learning
practical. In Advances in Kernel Methods - Support
Vector Learning.
T. Joachims. 2002. Optimizing search engines using
clickthrough data. In Proceedings of the ACM Con-
ference on Knowledge Discovery and Data Mining
(KDD).
Dekang Lin. 1998. Automatic retrieval and clustering of
similar words. In Proceedings of COLING-ACL’98.
Dekang Lin. 1999. Automatic identification of non-
compositional phrases. In Proceedings of ACL-99,
College Park, USA.
D. McCarthy, B. Keller, and J. Carroll. 2003. Detect-
ing a continuum of compositionality in phrasal verbs.
In Proceedings of the ACL-2003 Workshop on Mul-
tiword Expressions: Analysis, Acquisition and Treat-
ment, 2003.
George A. Miller, Richard Beckwith, Christiane Fell-
baum, Derek Gross, and Katherine J. Miller. 1990.
Introduction to wordnet: an on-line lexical database.
In International Journal of Lexicography.
G. Nunberg, I. A. Sag, and T. Wasow. 1994. Idioms. In
Language, 1994.
I. A. Sag, Timothy Baldwin, Francis Bond, Ann Copes-
take, and Dan Flickinger. 2002. Multi-word expres-
sions: a pain in the neck for nlp. In Proceedings of
CICLing , 2002.
Patrick Schone and Dan Jurafsky. 2001. Is knowledge-
free induction of multiword unit dictionary headwords
a solved problem? In Proceedings of EMNLP , 2001.
William Schuler and Aravind K. Joshi. 2004. Relevance
of tree rewriting systems for multi-word expressions.
In To be published.
Hinrich Schutze. 1998. Automatic word-sense discrimi-
nation. In Computational Linguistics.
S. Siegel and N. John Castellan. 1988. In Non-
parametric Statistics of the Behavioral Sciences.
McGraw-Hill, NJ.
Pasi Tapanainen, Jussi Piitulaine, and Timo Jarvinen.
1998. Idiomatic object usage and support verbs. In
36th Annual Meeting of the Association for Computa-
tional Linguistics.
Sriram Venkatapathy and Aravind K. Joshi. 2004.
Recognition of multi-word expressions: A study of
verb-noun (v-n) collocations. In Proceedings of the
International Conference on Natural Language Pro-
cessing,2004.
906
