Proceedings of the Third ACL-SIGSEM Workshop on Prepositions, pages 65–72,
Trento, Italy, April 2006. c©2006 Association for Computational Linguistics
Automatic Identification of English Verb Particle Constructions
using Linguistic Features
Su Nam Kim and Timothy Baldwin
Department of Computer Science and Software Engineering
University of Melbourne, Victoria 3010 Australia
{snkim,tim}@csse.unimelb.edu.au
Abstract
This paper presents a method for identify-
ing token instances of verb particle con-
structions (VPCs) automatically, based on
the output of the RASP parser. The pro-
posed method pools together instances of
VPCs and verb-PPs from the parser out-
put and uses the sentential context of each
such instance to differentiate VPCs from
verb-PPs. We show our technique to per-
form at an F-score of 97.4% at identifying
VPCs in Wall Street Journal and Brown
Corpus data taken from the Penn Tree-
bank.
1 Introduction
Multiword expressions (hereafter MWEs) are
lexical items that can be decomposed into multi-
ple simplex words and display lexical, syntactic
and/or semantic idiosyncracies (Sag et al., 2002;
Calzolari et al., 2002). In the case of English,
MWEs are conventionally categorised syntactico-
semantically into classes such as compound nom-
inals (e.g. New York, apple juice, GM car), verb
particle constructions (e.g. hand in, battle on),
non-decomposable idioms (e.g. a piece of cake,
kick the bucket) and light-verb constructions (e.g.
make a mistake). MWE research has focussed
largely on their implications in language under-
standing, fluency and robustness (Pearce, 2001;
Sag et al., 2002; Copestake and Lascarides, 1997;
Bannard et al., 2003; McCarthy et al., 2003; Wid-
dows and Dorow, 2005). In this paper, our goal
is to identify individual token instances of En-
glish verb particle constructions (VPCs hereafter)
in running text.
For the purposes of this paper, we follow Bald-
win (2005) in adopting the simplifying assump-
tion that VPCs: (a) consist of a head verb and a
unique prepositional particle (e.g. hand in, walk
off); and (b) are either transitive (e.g. hand in, put
on) or intransitive (e.g. battle on). A defining char-
acteristic of transitive VPCs is that they can gen-
erally occur with either joined (e.g. He put on the
sweater) or split (e.g. He put the sweater on) word
order. In the case that the object is pronominal,
however, the VPC must occur in split word order
(c.f. *He handed in it) (Huddleston and Pullum,
2002; Villavicencio, 2003).
The semantics of the VPC can either derive
transparently from the semantics of the head verb
and particle (e.g. walk off ) or be significantly re-
moved from the semantics of the head verb and/or
particle (e.g. look up); analogously, the selectional
preferences of VPCs can mirror those of their head
verbs or alternatively diverge markedly. The syn-
tax of the VPC can also coincide with that of the
head verb (e.g. walk off ) or alternatively diverge
(e.g. lift off ).
In the following, we review relevant past
research on VPCs, focusing on the extrac-
tion/identification of VPCs and the prediction of
the compositionality/productivity of VPCs.
There is a modest body of research on the iden-
tification and extraction of VPCs. Note that in
the case of VPC identification we seek to detect
individual VPC token instances in corpus data,
whereas in the case of VPC extraction we seek
to arrive at an inventory of VPC types/lexical
items based on analysis of token instances in cor-
pus data. Li et al. (2003) identify English VPCs
(or “phrasal verbs” in their parlance) using hand-
coded regular expressions. Baldwin and Villavi-
cencio (2002) extract a simple list of VPCs from
corpus data, while Baldwin (2005) extracts VPCs
with valence information under the umbrella of
deep lexical acquisition.1 The method of Baldwin
(2005) is aimed at VPC extraction and takes into
account only the syntactic features of verbs. In this
paper, our interest is in VPC identification, and we
make use of deeper semantic information.
In Fraser (1976) and Villavicencio (2006) it is
argued that the semantic properties of verbs can
determine the likelihood of their occurrence with
1The learning of lexical items in a form that can be fed
directly into a deep grammar or other richly-annotated lexical
resource
65
particles. Bannard et al. (2003) and McCarthy et
al. (2003) investigate methods for estimating the
compositionality of VPCs based largely on dis-
tributional similarity of the head verb and VPC.
O’Hara and Wiebe (2003) propose a method for
disambiguating the verb sense of verb-PPs. While
our interest is in VPC identification—a fundamen-
tally syntactic task—we draw on the shallow se-
mantic processing employed in these methods in
modelling the semantics of VPCs relative to their
base verbs.
The contribution of this paper is to combine
syntactic and semantic features in the task of VPC
identification. The basic intuition behind the pro-
posed method is that the selectional preferences of
VPCs over predefined argument positions,2 should
provide insight into whether a verb and preposi-
tion in a given sentential context combine to form
a VPC (e.g. Kim handed in the paper) or alter-
natively constitute a verb-PP (e.g. Kim walked in
the room). That is, we seek to identify individual
preposition token instances as intransitive preposi-
tions (i.e. prepositional particles) or transitive par-
ticles based on analysis of the governing verb.
The remainder of the paper is structured as fol-
lows. Section 2 outlines the linguistic features of
verbs and their co-occuring nouns. Section 3 pro-
vides a detailed description of our technique. Sec-
tion 4 describes the data properties and the identi-
fication method. Section 5 contains detailed evalu-
ation of the proposed method. Section 6 discusses
the effectiveness of our approach. Finally, Sec-
tion 7 summarizes the paper and outlines future
work.
2 Linguistic Features
When verbs co-occur with particles to form VPCs,
their meaning can be significantly different from
the semantics of the head verb in isolation. Ac-
cording to Baldwin et al. (2003), divergences in
VPC and head verb semantics are often reflected
in differing selectional preferences, as manifested
in patterns of noun co-occurrence. In one example
cited in the paper, the cosine similarity between
cut and cut out, based on word co-occurrence vec-
tors, was found to be greater than that between cut
and cut off, mirroring the intuitive compositional-
ity of these VPCs.
(1) and (2) illustrate the difference in the selec-
tional preferences of the verb put in isolation as
compared with the VPC put on.3
2Focusing exclusively on the subject and object argument
positions.
3All sense definitions are derived from WordNet 2.1.
(1) put = place
EX: Put the book on the table.
ARGS: bookOBJ = book, publication, object
ANALYSIS: verb-PP
(2) put on = wear
EX: Put on the sweater .
ARGS: sweaterOBJ = garment, clothing
ANALYSIS: verb particle construction
While put on is generally used in the context of
wearing something, it usually occurs with clothing-
type nouns such as sweater and coat, whereas the
simplex put has less sharply defined selectional re-
strictions and can occur with any noun. In terms
of the word senses of the head nouns of the ob-
ject NPs, the VPC put on will tend to co-occur
with objects which have the semantics of clothes
or garment. On the other hand, the simplex verb
put in isolation tends to be used with objects with
the semantics of object and prepositional phrases
containing NPs with the semantics of place.
Also, as observed above, the valence of a VPC
can differ from that of the head verb. (3) and (4)
illustrate two different senses of take off with in-
transitive and transitive syntax, respectively. Note
that take cannot occur as a simplex intransitive
verb.
(3) take off = lift off
EX: The airplane takes off.
ARGS: airplaneSUBJ = airplane, aeroplane
ANALYSIS: verb particle construction
(4) take off = remove
EX: They take off the cape .
ARGS: theySUBJ = person, individual
capeOBJ = garment, clothing
ANALYSIS: verb particle construction
Note that in (3), take off = lift off co-occurs with
a subject of the class airplane, aeroplane. In (4), on
the other hand, take off = remove and the corre-
sponding object noun is of class garment or cloth-
ing. From the above, we can see that head nouns
in the subject and object argument positions can
be used to distinguish VPCs from simplex verbs
with prepositional phrases (i.e. verb-PPs).
66
3 Approach
Our goal is to distinguish VPCs from verb-PPs in
corpus data, i.e. to take individual inputs such as
Kim handed the paper in today and tag each as
either a VPC or a verb-PP. Our basic approach is
to parse each sentence with RASP (Briscoe and
Carroll, 2002) to obtain a first-gloss estimate of
the VPC and verb-PP token instances, and also
identify the head nouns of the arguments of each
VPC and simplex verb. For the head noun of each
subject and object, as identified by RASP, we use
WordNet 2.1 (Fellbaum, 1998) to obtain the word
sense. Finally we build a supervised classifier us-
ing TiMBL 5.1 (Daelemans et al., 2004).
3.1 Method
Compared to the method proposed by Baldwin
(2005), our approach (a) tackles the task of VPC
identification rather than VPC extraction, and (b)
uses both syntactic and semantic features, employ-
ing the WordNet 2.1 senses of the subject and/or
object(s) of the verb. In the sentence He put the
coat on the table, e.g., to distinguish the VPC put
on from the verb put occurring with the preposi-
tional phrase on the table, we identify the senses
of the head nouns of the subject and object(s) of
the verb put (i.e. he and coat, respectively).
First, we parse all sentences in the given corpus
using RASP, and identify verbs and prepositions
in the RASP output. This is a simple process of
checking the POS tags in the most-probable parse,
and for both particles (tagged RP) and transitive
prepositions (taggedII) reading off the governing
verb from the dependency tuple output (see Sec-
tion 3.2 for details). We also retrieved the head
nouns of the subject and object(s) of each head
verb directly from the dependency tuples. Using
WordNet 2.1, we then obtain the word sense of the
head nouns.
The VPCs or verb-PPs are represented with cor-
responding information as given below:
P(type|v,p,wsSUBJ,wsDOBJ,wsIOBJ)
where type denotes either a VPC or verb-PP, v is
the head verb, p is the preposition, and ws* is the
word sense of the subject, direct object or indirect
object.
Once all the data was gathered, we separated it
into test and training data. We then used TiMBL
5.1 to learn a classifier from the training data,
which was then run and evaluated over the test
data. See Section 5 for full details of the results.
Figure 1 depicts the complete process used to
distinguish VPCs from verb-PPs.
textraw
Particles Objects
Senses
corpus
Subjects
WordNet
Word
v+p with Semantics
Verbs
TiMBL Classifier
look_after := [..
put_on := [..
take_off := [..
e.g.
Preprocessing RASPparser
Figure 1: System Architecture
3.2 On the use of RASP, WordNet and
TiMBL
RASP is used to identify the syntactic structure
of each sentence, including the head nouns of ar-
guments and first-gloss determination of whether
a given preposition is incorporated in a VPC or
verb-PP. The RASP output contains dependency
tuples derived from the most probable parse, each
of which includes a label identifying the nature
of the dependency (e.g. SUBJ, DOBJ), the head
word of the modifying constituent, and the head of
the modified constituent. In addition, each word
is tagged with a POS tag from which it is possi-
ble to determine the valence of any prepositions.
McCarthy et al. (2003) evaluate the precision of
RASP at identifying VPCs to be 87.6% and the re-
call to be 49.4%. However the paper does not eval-
uate the parser’s ability to distinguish sentences
containing VPCs and sentences with verb-PPs.
To better understand the baseline performance
of RASP, we counted the number of false-positive
examples tagged with RP and false-negative ex-
amples tagged with II, relative to gold-standard
data. See Section 5 for details.
We use WordNet to obtain the first-sense word
sense of the head nouns of subject and object
phrases, according to the default word sense rank-
ing provided within WordNet. McCarthy et al.
(2004) found that 54% of word tokens are used
with their first (or default) sense. With the per-
formance of current word sense disambiguation
(WSD) systems hovering around 60-70%, a sim-
ple first-sense WSD system has room for improve-
ment, but is sufficient for our immediate purposes
67
in this paper.
To evaluate our approach, we built a super-
vised classifier using the TiMBL 5.1 memory-
based learner and training data extracted from the
Brown and WSJ corpora.
4 Data Collection
We evaluated out method by running RASP over
Brown Corpus and Wall Street Journal, as con-
tained in the Penn Treebank (Marcus et al., 1993).
4.1 Data Classification
The data we consider is sentences containing
prepositions tagged as either RP or II. Based on
the output of RASP, we divide the data into four
groups:
Group AGroup BGroup C
RP & II tagged dataRP tagged data II tagged data
Group D
Group A contains the verb–preposition token
instances tagged tagged exclusively as VPCs (i.e.
the preposition is never tagged as II in combi-
nation with the given head verb). Group B con-
tains the verb–preposition token instances iden-
tified as VPCs by RASP where there were also
instances of that same combination identified as
verb-PPs. Group C contains the verb–preposition
token instances identified as verb-PPs by RASP
where there were also instances of that same com-
bination identified as VPCs. Finally, group D
contains the verb-preposition combinations which
were tagged exclusively as verb-PPs by RASP.
We focus particularly on disambiguating verb–
preposition token instances falling into groups B
and C, where RASP has identified an ambiguity
for that particular combination. We do not further
classify token instances in group D, on the grounds
that (a) for high-frequency verb–preposition com-
binations, RASP was unable to find a single in-
stance warranting a VPC analysis, suggesting it
had high confidence in its ability to correctly iden-
tify instances of this lexical type, and (b) for low-
frequency verb–preposition combinations where
the confidence of there definitively no being a
VPC usage is low, the token sample is too small
to disambiguate effectively and the overall impact
would be negligible even if we tried. We do, how-
ever, return to considered data in group D in com-
puting the precision and recall of RASP.
Naturally, the output of RASP parser is not
error-free, i.e. VPCs may be parsed as verb-PPs
FPR FNR Agreement
Group A 4.08% — 95.24%
Group B 3.96% — 99.61%
Group C — 10.15% 93.27%
Group D — 3.4% 99.20%
Table 1: False positive rate (FPR), false negative
rate (FNR) and inter-annotator agreement across
the four groups of token instances
f ≥ 1 f ≥ 5
VPC V-PP VPC V-PP
Group A 5,223 0 3,787 0
Group B 1,312 0 1,108 0
Group C 0 995 0 217
Total 6,535 995 4,895 217
Table 2: The number of VPC and verb-PP token
instances occurring in groups A, B and C at vary-
ing frequency cut-offs
and vice versa. In particular, other than the re-
ported results of McCarthy et al. (2003) targeting
VPCs vs. all other analyses, we had no a priori
sense of RASP’s ability to distinguish VPCs and
verb-PPs. Therefore, we manually checked the
false-positive and false-negative rates in all four
groups and obtained the performance of parser
with respect to VPCs. The verb-PPs in group A
and B are false-positives while the VPCs in group
C and D are false-negatives (we consider the VPCs
to be positive examples).
To calculate the number of incorrect examples,
two human annotators independently checked
each verb–preposition instance. Table 1 details the
rate of false-positives and false-negative examples
in each data group, as well as the inter-annotator
agreement (calculated over the entire group).
4.2 Collection
We combined together the 6,535 (putative) VPCs
and 995 (putative) verb-PPs from groups A, B and
C, as identified by RASP over the corpus data. Ta-
ble 2 shows the number of VPCs in groups A and
B and the number of verb-PPs in group C. The
first number is the number of examples occuring
at least once and the second number that of exam-
ples occurring five or more times.
From the sentences containing VPCs and verb-
PPs, we retrieved a total of 8,165 nouns, including
68
Type Groups A&B Group C
common noun 7,116 1,239
personal pronoun 629 79
demonstrative pronoun 127 1
proper noun 156 18
who 94 6
which 32 0
No sense (what) 11 0
Table 3: Breakdown of subject and object head
nouns in group A&B, and group C
pronouns (e.g. I, he, she), proper nouns (e.g. CITI,
Canada, Ford) and demonstrative pronouns (e.g.
one, some, this), which occurred as the head noun
of a subject or object of a VPC in group A or B.
We similarly retrieved 1,343 nouns for verb-PPs in
group C. Table 3 shows the distribution of different
noun types in these two sets.
We found that about 10% of the nouns are pro-
nouns (personal or demonstrative), proper nouns
or WH words. For pronouns, we manually re-
solved the antecedent and took this as the head
noun. When which is used as a relative pronoun,
we identified if it was coindexed with an argument
position of a VPC or verb-PP, and if so, manually
identified the antecedent, as illustrated in (5).
(5) EX: Tom likes the books which he sold off.
ARGS: heSUBJ = person
whichOBJ = book
With what, on the other hand, we were gener-
ally not able to identify an antecedent, in which
case the argument position was left without a word
sense (we come back to this in Section 6).
(6) Tom didn’t look up what to do.
What went on?
We also replaced all proper nouns with cor-
responding common noun hypernyms based on
manual disambiguation, as the coverage of proper
nouns in WordNet is (intentionally) poor. The fol-
lowing are examples of proper nouns and their
common noun hypernyms:
Proper noun Common noun hypernym
CITI bank
Canada country
Ford company
Smith human
produce, green goods, ...food(3rd)
...
reproductive structure...
pome, false fruit
reproductive structurefruit
fruit(2nd)
citrus, citrus fruit, citrous fruitedible fruit(2nd)
edible fruit(1st)apple
Sense 1
Sense 1orange
produce, green goods, ...food(4th)
...
..fruit(3rd)
Figure 2: Senses of apple and orange
When we retrieved the first word sense of nouns
from WordNet, we selected the first sense and the
associated hypernyms (up to) three levels up the
WordNet hierarchy. This is intended as a crude
form of smoothing for closely-related word senses
which occur in the same basic region of the Word-
Net hierarchy. As an illustration of this process,
in Figure 2, apple and orange are used as edi-
ble fruit, fruit or food, and the semantic overlap is
picked up on by the fact that edible fruit is a hy-
pernym of both apple and orange. On the other
hand, food is the fourth hypernym for orange so it
is ignored by our method. However, because we
use the four senses, the common senses of nouns
are extracted properly. This approach works rea-
sonably well for retrieving common word senses
of nouns which are in the immediate vicinity of
each other in the WordNet hierarchy, as was the
case with apple and orange. In terms of feature
representation, we generate an individual instance
for each noun sense generated based on the above
method, and in the case that we have multiple ar-
guments for a given VPC or verb-PP (e.g. both a
subject and a direct object), we generate an indi-
vidual instance for the cross product of all sense
combinations between the arguments.
We use 80% of the data for training and 20%
for testing. The following is the total number of
training instances, before and after performing hy-
pernym expansion:
Training Instances
Before expansion After expansion
Group A 5,223 24,602
Group B 1,312 4,158
Group C 995 5,985
69
Group Frequency of VPCs Size
B (f≥1 ) test:272
(f≥5 ) train:1,040
BA (f≥1 & f≥1) test:1,327
(f≥5 & f≥5) train:4,163
BC (f≥1 & f≥1) test:498
(f≥5 & f≥1) train:1,809
BAC (f≥1 & f≥1 & f≥1 ) test:1,598
(f≥5 & f≥5 & f≥1 ) train:5,932
Table 4: Data set sizes at different frequency cut-
offs
5 Evaluation
We selected 20% of the test data from different
combinations of the four groups and over the two
frequency thresholds, leading to a total of 8 test
data sets. The first data set contains examples from
group B only, the second set is from groups B and
A, the third set is from groups B and C, and the
fourth set is from groups B, A and C. Addition-
ally, each data set is divided into: (1) f ≥ 1, i.e.
verb–preposition combinations occurring at least
once, and (2) f ≥ 5, i.e. verb–preposition com-
binations occurring at least five times (hereafter,
f ≥ 1 is labelled f≥1 and f ≥ 5 is labelled f≥5 ).
In the group C data, there are 217 verb-PPs with
f≥5 , which is slightly more than 20% of the data
so we use verb-PPs with f≥1 for experiments in-
stead of verb-PP with f≥5. The first and second
data sets do not contain negative examples while
the third and fourth data sets contain both positive
and negative examples. As a result, the precision
for the first two data sets is 1.0.
Table 5 shows the precision, recall and F-score
of our method over each data set, relative to the
identification of VPCs only. A,B,C are groups and
f# is the frequency of examples.
Table 6 compares the performance of VPC iden-
tification and verb-PP identification.
Table 7 indicates the result using four word
senses (i.e. with hypernym expansion) and only
one word sense (i.e. the first sense only).
6 Discussion
The performance of RASP as shown in Tables 5
and 6 is based on human judgement. Note that
we only consider the ability of the parser to distin-
guish sentences with prepositions as either VPCs
or verb-PPs (i.e. we judge the parse to be correct if
the preposition is classified correctly, irrespective
of whether there are other errors in the output).
Data Freq P R F
RASP f≥1 .959 .955 .957
B f≥1 1.0 .819 .901
f≥5 1.0 .919 .957
BA f≥1 f≥1 1.0 .959 .979
f≥5 f≥5 1.0 .962 .980
BC f≥1 f≥1 .809 .845 .827
f≥5 f≥1 .836 .922 .877
BAC f≥1 f≥1 f≥1 .962 .962 .962
f≥5 f≥5 f≥1 .964 .983 .974
Table 5: Results for VPC identification only (P =
precision, R = recall, F = F-score)
Data Freq Type P R F
RASP f≥1 P+V .933 – –
BC f≥1 f≥1 P+V .8068 .8033 .8051
f≥5 f≥1 P+V .8653 .8529 .8591
BAC f≥1 f≥1 P+V .8660 .8660 .8660
f≥5 f≥1 P+V .9272 .8836 .9054
Table 6: Results for VPC (=V) and verb-PP (=P)
identification (P = precision, R = recall, F = F-
score)
Also, we ignore the ambiguity between particles
and adverbs, which is the principal reason for our
evaluation being much higher than that reported
by McCarthy et al. (2003). In Table 5, the preci-
sion (P) and recall (R) for VPCs are computed as
follows:
P = Data Correctly Tagged as VPCsData Retrieved as VPCs
R = Data Correctly Tagged as VPCsAll VPCs in Data Set
The performance of RASP in Table 6 shows
how well it distinguishes between VPCs and verb-
PPs for ambiguous verb–preposition combina-
tions. Since Table 6 shows the comparative per-
formance of our method between VPCs and verb-
PPs, the performance of RASP with examples
which are misrecognized as each other should be
the guideline. Note, the baseline RASP accuracy,
based on assigning the majority class to instances
in each of groups A, B and C, is 83.04%.
In Table 5, the performance over high-
frequency data identified from groups B, A and
C is the highest (F-score = .974). In general, we
would expect the data set containing the high fre-
quency and both positive and negative examples
70
Freq Type # P R F
f≥1 V 4WS .962 .962 .962
1WS .958 .969 .963
f≥1 P 4WS .769 .769 .769
1WS .800 .743 .770
f≥5 V 4WS .964 .983 .974
1WS .950 .973 .962
f≥5 P 4WS .889 .783 .832
1WS .813 .614 .749
Table 7: Results with hypernym expansion (4WS)
and only the first sense (1WS), in terms of preci-
sion (P), recall (R) and F-score (F)
to give us the best performance at VPC identifi-
cation. We achieved a slightly better result than
the 95.8%-97.5% performance reported by Li et
al. (2003). However, considering that Li et al.
(2003) need considerable time and human labour
to generate hand-coded rules, our method has ad-
vantages in terms of both raw performance and
labour efficiency.
Combining the results for Table 5 and Table 6,
we see that our method performs better for VPC
identification than verb-PP identification. Since
we do not take into account the data from group
D with our method, the performance of verb-PP
identification is low compared to that for RASP,
which in turn leads to a decrement in the overall
performance.
Since we ignored the data from group D con-
taining unambiguous verb-PPs, the number of pos-
itive training instances for verb-PP identification
was relatively small. As for the different number
of word senses in Table 7, we conclude that the
more word senses the better the performance, par-
ticularly for higher-frequency data items.
In order to get a clearer sense of the impact of
selectional preferences on the results, we investi-
gated the relative performance over VPCs of vary-
ing semantic compositionality, based on 117 VPCs
(f≥1 ) attested in the data set of McCarthy et al.
(2003). According to our hypothesis from above,
we would expect VPCs with low composition-
ality to have markedly different selectional pref-
erences to the corresponding simplex verb, and
VPCs with high compositionality to have similar
selectional preferences to the simplex verb. In
terms of the performance of our method, therefore,
we would expect the degree of compositionality
to be inversely proportional to the system perfor-
mance. We test this hypothesis in Figure 3, where
we calculate the error rate reduction (in F-score)
 0
 20
 40
 60
 80
 100
 0  1  2  3  4  5  6  7  8  9  10 0
 20
 40
 60
 80
 100
Error Rate Reduction (%)
Types
Compositionality
Figure 3: Error rate reduction for VPCs of varying
compositionality
for the proposed method relative to the majority-
class baseline, at various degrees of composition-
ality. McCarthy et al. (2003) provides compo-
sitionality judgements from three human judges,
which we take the average of and bin into 11 cate-
gories (with 0 = non-compositional and 10 = fully
compositional). In Figure 3, we plot both the er-
ror rate reduction in each bin (both the raw num-
bers and a smoothed curve), and also the number
of attested VPC types found in each bin. From
the graph, we see our hypothesis born out that,
with perfect performance over non-compositional
VPCs and near-baseline performance over fully
compositional VPCs. Combining this result with
the overall results from above, we conclude that
our method is highly successful at distinguishing
non-compositional VPCs from verb-PPs, and fur-
ther that there is a direct correlation between the
degree of compositionality and the similarity of
the selectional preferences of VPCs and their verb
counterparts.
Several factors are considered to have influ-
enced performance. Some data instances are miss-
ing head nouns which would assist us in determin-
ing the semantics of the verb–preposition combi-
nation. Particular examples of this are imperative
and abbreviated sentences:
(7) a. Come in.
b. (How is your cold?) Broiled out.
Another confounding factor is the lack of word
sense data, particularly in WH questions:
(8) a. What do I hand in?
b. You can add up anything .
71
7 Conclusion
In this paper, we have proposed a method for iden-
tifying VPCs automatically from raw corpus data.
We first used the RASP parser to identify VPC
and verb-PP candidates. Then, we used analysis of
the head nouns of the arguments of the head verbs
to model selectional preferences, and in doing so,
distinguish between VPCs and verb-PPs. Using
TiMBL 5.1, we built a classifier which achieved
an F-score of 97.4% at identifying frequent VPC
examples. We also investigated the comparative
performance of RASP at VPC identification.
The principal drawback of our method is that it
relies on the performance of RASP and we assume
a pronoun resolution oracle to access the word
senses of pronouns. Since the performance of such
systems is improving, however, we consider our
approach to be a promising, stable method of iden-
tifying VPCs.
Acknowledgements
This material is based upon work supported in part by the
Australian Research Council under Discovery Grant No.
DP0663879 and NTT Communication Science Laboratories,
Nippon Telegraph and Telephone Corporation. We would
like to thank the three anonymous reviewers for their valu-
able input on this research.

References
Timothy Baldwin and Aline Villavicencio. 2002. Extract-
ing the unextractable: A case study on verb-particles. In
Proc. of the 6th Conference on Natural Language Learn-
ing (CoNLL-2002), pages 98–104, Taipei, Taiwan.
Timothy Baldwin, Colin Bannard, Takaaki Tanaka, and Do-
minic Widdows. 2003. An empirical model of multiword
expression decomposability. In Proc. of the ACL-2003
Workshop on Multiword Expressions: Analysis, Acquisi-
tion and Treatment, pages 89–96, Sapporo, Japan.
Timothy Baldwin. 2005. The deep lexical acquisition of
English verb-particle constructions. Computer Speech
and Language, Special Issue on Multiword Expressions,
19(4):398–414.
Colin Bannard, Timothy Baldwin, and Alex Lascarides.
2003. A statistical approach to the semantics of verb-
particles. In Proc. of the ACL-2003 Workshop on Multi-
word Expressions: Analysis, Acquisition and Treatment,
pages 65–72, Sapporo, Japan.
Ted Briscoe and John Carroll. 2002. Robust accurate statisti-
cal annotation of general text. In Proc. of the 3rd Interna-
tional Conference on Language Resources and Evaluation
(LREC 2002), pages 1499–1504, Las Palmas, Canary Is-
lands.
Nicoletta Calzolari, Charles Fillmore, Ralph Grishman,
Nancy Ide, Alessandro Lenci, Catherine MacLeod, and
Antonio Zampolli. 2002. Towards best practice for mul-
tiword expressions in computational lexicons. In Proc. of
the 3rd International Conference on Language Resources
and Evaluation (LREC 2002), pages 1934–40, Las Pal-
mas, Canary Islands.
Ann Copestake and Alex Lascarides. 1997. Integrating sym-
bolic and statistical representations: The lexicon pragmat-
ics interface. In Proc. of the 35th Annual Meeting of the
ACL and 8th Conference of the EACL (ACL-EACL’97),
pages 136–43, Madrid, Spain.
Walter Daelemans, Jakub Zavrel, Ko van der Sloot, and An-
tal van den Bosch. 2004. TiMBL: Tilburg Memory Based
Learner, version 5.1, Reference Guide. ILK Technical Re-
port 04-02.
Christiane Fellbaum, editor. 1998. WordNet: An Electronic
Lexical Database. MIT Press, Cambridge, USA.
B. Fraser. 1976. The Verb-Particle Combination in English.
The Hague: Mouton.
Rodney Huddleston and Geoffrey K. Pullum. 2002. The
Cambridge Grammar of the English Language. Cam-
bridge University Press, Cambridge, UK.
Wei Li, Xiuhong Zhang, Cheng Niu, Yuankai Jiang, and Ro-
hini K. Srihari. 2003. An expert lexicon approach to iden-
tifying English phrasal verbs. In Proc. of the 41st Annual
Meeting of the ACL, pages 513–20, Sapporo, Japan.
Mitchell P. Marcus, Beatrice Santorini, and Mary Ann
Marcinkiewicz. 1993. Building a large annotated corpus
of English: the Penn treebank. Computational Linguis-
tics, 19(2):313–30.
Diana McCarthy, Bill Keller, and John Carroll. 2003. De-
tecting a continuum of compositionality in phrasal verbs.
In Proc. of the ACL-2003 Workshop on Multiword Ex-
pressions: Analysis, Acquisition and Treatment, Sapporo,
Japan.
Diana McCarthy, Rob Koeling, Julie Weeds, and John Car-
roll. 2004. Finding predominant senses in untagged text.
In Proc. of the 42nd Annual Meeting of the ACL, pages
280–7, Barcelona, Spain.
Tom O’Hara and Janyce Wiebe. 2003. Preposition semantic
classification via Treebank and FrameNet. In Proc. of the
7th Conference on Natural Language Learning (CoNLL-
2003), pages 79–86, Edmonton, Canada.
Darren Pearce. 2001. Synonymy in collocation extraction.
In Proceedings of the NAACL 2001 Workshop on WordNet
and Other Lexical Resources: Applications, Extensions
and Customizations, Pittsburgh, USA.
Ivan A. Sag, Timothy Baldwin, Francis Bond, Ann Copes-
take, and Dan Flickinger. 2002. Multiword expressions:
A pain in the neck for NLP. In Proc. of the 3rd Interna-
tional Conference on Intelligent Text Processing and Com-
putational Linguistics (CICLing-2002), pages 1–15, Mex-
ico City, Mexico.
Aline Villavicencio. 2003. Verb-particle constructions and
lexical resources. In Proc. of the ACL-2003 Workshop on
Multiword Expressions: Analysis, Acquisition and Treat-
ment, pages 57–64, Sapporo, Japan.
Aline Villavicencio. 2006. Verb-particle constructions in the
world wide web. In Patrick Saint-Dizier, editor, Compu-
tational Linguistics Dimensions of Syntax and Semantics
of Prepositions. Springer, Dordrecht, Netherlands.
Dominic Widdows and Beate Dorow. 2005. Automatic ex-
traction of idioms using graph analysis and asymmetric
lexicosyntactic patterns. In Proc. of the ACL-SIGLEX
2005 Workshop on Deep Lexical Acquisition, pages 48–
56, Ann Arbor, USA.
