Probabilistic Models of Verb-Argument Structure
Daniel Gildea
Dept. of Computer and Information Science
University of Pennsylvania
dgildea@cis.upenn.edu
Abstract
We evaluate probabilistic models of verb argument
structure trained on a corpus of verbs and their syn-
tactic arguments. Models designed to represent pat-
terns of verb alternation behavior are compared with
generic clustering models in terms of the perplexity
assigned to held-out test data. While the special-
ized models of alternation do not perform as well,
closer examination reveals alternation behavior rep-
resented implicitly in the generic models.
1 Introduction
Recent research into verb-argument structure has
has attempted to acquire the syntactic alternation
behavior of verbs directly from large corpora. Mc-
Carthy (2000), Merlo and Stevenson (2001), and
Schulte im Walde (2000) have evaluated their sys-
tems’ accuracy against human judgments of verb
classification, with the comprehensive verb classes
of Levin (1993) often serving as a gold standard.
Another area of research has focused on automatic
clustering algorithms for verbs and their arguments
with the goal of finding groups of semantically re-
lated words (Pereira et al., 1993; Rooth et al., 1999),
without focusing specifically on alternation behav-
ior. We aim to bring these strands of research to-
gether with a unified probabilistic model of verb ar-
gument structure incorporating alternation behavior.
Unraveling the mapping between syntactic func-
tions such as subject and object and semantic roles
such as agent and patient is an important piece
of the language understanding problem. Learn-
ing the alternation behavior of verbs automatically
from unannotated text would significantly reduce
the amount of labor needed to create text under-
standing systems, whether that labor takes the form
of writing lexical entries or of annotating semantic
information to train statistical systems.
Our use of generative probabilistic models of ar-
gument structure also allows for language modeling
applications independent of semantic interpretation.
Language models based on head-modifier lexical
dependencies in syntactic trees have been shown to
have lower perplexity than D2-gram language models
and to reduce word-error rates for speech recogni-
tion (Chelba and Jelinek, 1999; Roark, 2001). In-
corporating semantic classes and verb alternation
behavior could improve such models’ performance.
Automatically derived word clusters are used in the
statistical parsers of Charniak (1997) and Mager-
man (1995). Incorporating alternation behavior into
such models might improve parsing results as well.
This paper focuses on evaluating probabilistic
models of verb-argument structure in terms of how
well they model unseen test data, as measured by
perplexity. We will examine maximum likelihood
bigram and trigram models, clustering models based
on those of Rooth et al. (1999), as well as a new
probabilistic model designed to capture alternations
in verb-argument structure.
2 Capturing Alternation Behavior
Automatic clustering of co-occurrences of verbs and
their direct objects was first used to induce se-
mantically related classes of both verbs and nouns
(Pereira et al., 1993). Rooth et al. (1999) used
the Expectation Maximization algorithm to perform
soft clustering by optimizing the parameters of a
fairly simple probability model, which considers the
verb and noun to be independent given the unob-
served cluster variable CR:
C8B4DABND2B5BP
CG
CR
C8B4CRB5C8B4DACYCRB5C8B4D2CYCRB5
In Rooth et al. (1999), the variable DA represented not
only the lexical verb but also its syntactic relation to
the noun: either direct object, subject of an intransi-
tive, or subject of a transitive verb.
However, the relationship between the underly-
ing, semantic arguments of a verb and the syntac-
tic roles in a sentence is not always straightforward.
Many verbs exhibit alternations in their syntactic
behavior, as shown by the following examples:
(1) The Federal Reserve increased rates by 1/4%.
(2) Interest rates have increased sharply over the
past year.
The noun rates appears as the syntactic object
of the verb increase in the first sentence, but
as its subject in the second sentence, where the
verb is used intransitively, that is, without an ob-
ject. One of the clusters found by the model of
Rooth et al. (1999) corresponded to “verb of scalar
change” such as increase, rise,anddecrease.The
model places both subject-of-intransitive-increase
and direct-object-of-increase in this class, but does
not explicitly capture the fact that these to values
represent different uses of the same verb.
The phenomenon of verb argument alternations
has been most comprehensively studied by Levin
(1993), who catalogs over 3,000 verbs into classes
according to which alternations they participate in.
A central thesis of Levin’s work is that a verb’s syn-
tactic alternations are related to its semantics, and
that semantically related verb will share the same
alternations. For example, the alternation of exam-
ples 1 and 2 is shared by verbs such as decrease and
diminish.
Table 1 gives the most common nouns occurring
as arguments of selected verbs in our corpus, show-
ing how alternation behavior shows up in corpus
statistics. The verbs open and increase, classified
by Levin and others as exhibiting a causative al-
ternation between transitive and intransitive usages,
share many of the same nouns in direct object and
subject-of-intransitive positions, as we would ex-
pect. For example, number, cost,andrate occur
among the ten most common nouns in both posi-
tions for increase, and themselves seem semanti-
cally related. For open, the first three words in either
position are the same. For the verb play, on the other
hand, classified as an “object-drop” verb by Merlo
and Stevenson (2001), we would expect overlap be-
tween the subject of transitive and intransitive uses.
This is in fact the case, with child, band,andteam
appearing among the top ten nouns for both posi-
tions. However, play also exhibits an alternation be-
tween the direct object and subject of intransitive
positions for music, role,andgame. These two sets
of nouns seem to fill different semantic roles of the
verb, the first set being agents and the second be-
ing themes. This example illustrate the complex in-
teraction between verb sense and alternation behav-
ior: “The band played” and the “The music played”
are considered to belong to different senses of play
by WordNet (Fellbaum, 1998) and other word sense
inventories. However, it is interesting to note that
nouns from both the broad senses of play,“playa
game” and “play music”, participate in both alter-
nations. An advantage of our EM-based soft clus-
tering algorithm is that it can assign a verb to mul-
tiple clusters; ideally, we would hope that a verb’s
clusters would correspond to its senses.
We expect verbs which take similar sets of argu-
ment fillers to be semantically related, and to par-
ticipate in the same alternations. This idea has been
used by McCarthy (2000) to identify verbs partici-
pating in specific alternations by looking for over-
lap between nouns used in different positions, and
by using WordNet to classify role fillers into se-
mantic categories. Schulte im Walde (2000) uses
an EM-based automatic clustering of verbs to at-
tempt to derive Levin classes from unlabeled data.
As in McCarthy (2000), the nouns are classified us-
ing WordNet. However, the appearance of the same
noun in different syntactic positions is not explicitly
captured by the probability model used for cluster-
ing.
This observation motivated a new probabilistic
model of verb argument structure designed to ex-
plicitly capture alternation behavior. In addition to
an unobserved cluster variable CR, we introduce a sec-
ond unobserved variable D6 for the semantic role of
an argument. The role D6 is dependent on both the
cluster CR to which our verb-noun pair belongs, and
the syntactic slot D7 in which the noun is found, and
the probability of an observed triple C8B4DABND7BND2B5 is es-
timated as:
CG
CRBND6
C8B4CRB5C8B4DACYCRB5C8B4D7CYCRB5C8B4D6CYCRBND7B5C8B4D2CYD6BNCRB5
The noun is independent of the verb given the clus-
ter variable, as before, and the noun is independent
of the syntactic slot D7 given the cluster CR and the se-
mantic role D6. The semantic role variable D6 can take
two values, with C8B4D6CYCRBND7B5 representing the mapping
from syntax to semantic role for a cluster of verbs.
We expect the clusters to consist of verbs that not
only appear with the same set of nouns, but share the
same mapping from syntactic position to semantic
role. For example increase and decrease might be-
long to same cluster as they both appear frequently
Verb Object Subj of Intransitive Subj of Transitive
close door door troop
eyes eyes door
mouth mouth police
firebreak exhibition gunman
way shop woman
possibility show man
gate trial guard
account conference soldier
window window one
shop gate company
increase risk number government
number proportion increase
share population use
profit rate effect
lead pressure sale
pressure amount level
rate cost presence
likelihood sale Party
chance rates Labour
cost profit bank
play part child band
role band factor
game team England
host role child
music player people
card game woman
piano smile man
tennis people team
parts music all
guitar boy group
Table 1: Examples from the corpus: most common arguments for selected verbs
with rate, number,andprice in both the direct ob-
ject and subject of intransitive slots, and would as-
sign the same value of D6 to both positions. The verb
lower might belong to a different cluster because,
although it appears with the same nouns, they ap-
pear as the direct object but not as the subject.
The Expectation Maximization algorithm is used
to train the model from the corpus, iterating over an
Expectation step in which expected values for the
two unobserved variables CR and D6 are calculated for
each observation in the training data, and a Maxi-
mization step in which the parameter of each of the
five distributions C8B4CRB5, C8B4DACYCRB5, C8B4D7CYCRB5, C8B4D6CYCRBND7B5,
and C8B4D2CYD2BNCRB5 are set to maximize the likelihood of
the data given the expectations for CR and D6.
3TheData
For our experiments we used a version of the British
National Corpus parsed with the statistical parser of
Collins (1997). Subject and direct object relations
were extracted by searching for NP nodes domi-
nated by S and VP nodes respectively. The head
words of the resulting subject and object nodes were
found using the deterministic headword rules em-
ployed by the parsing model. The individual obser-
vations of our dataset are noun-verb pairs of three
types: direct object, subject of a verb with an ob-
ject, and subject of a verb without an object. As a
result, the subject and object relations of the same
original sentence are considered independently by
all of the models we examine.
Direct object noun phrases were assigned the
function tags of the Treebank-2 annotation style
(Marcus et al., 1994) in order to distinguish noun
phrases such as temporal adjuncts from true direct
objects. For example, in the sentence “He ate yes-
terday”, yesterday would be assigned the Temporal
tag, and therefore not considered a direct object for
our purposes. Similarly, in the sentence “Interest
rates rose 2%”, 2% would be assigned the Extent
tag, and this instance of rise would be considered
intransitive.
Function tags were assigned using a simple prob-
ability model trained on the Wall Street Journal data
from the Penn Treebank, in a technique similar to
that of Blaheta and Charniak (2000). The model
predicts the function tag conditioned on the verb and
head noun of the noun phrase:
C8B4CUCYDABND2B5BP
B4
DI
C8B4CUCYDABND2B5 B4DABND2B5 BE CC
BD
BE
DI
C8B4CUCYDAB5B7
BD
BE
DI
C8B4CUCYD2B5 otherwise
where CU ranges over the function tags defined (Mar-
cus et al., 1994), or the null tag. Only cases assigned
the null tag by this model were considered true di-
rect objects. Evaluated on the binary task of whether
to assign a function tag to noun phrases in object
position, this classifier was correct 95% of the time
on held-out data from the Wall Street Journal. By
never assigning a function tag, one would achieve
85% accuracy. While we have no way to evaluate
its accuracy on the British National Corpus, certain
systematic errors are apparent. For example, while
it classifies 2% as an Extent in “Interest rates in-
creased 2%”, it assigns no tag to crack in “The door
opened a crack”. This type of error leads to the ap-
pearance of door as a subject on transitive uses of
open in Table 1.
Both verbs and nouns were lemmatized using the
XTAG morphological dictionary (XTAG Research
Group, 2001). As we wished to focus on alternation
behavior, verbs that were used intransitively than
90% of the time were excluded from the data; we
envision that they would be handled by a separate
probability model. Pronouns were excluded from
the dataset, as were verbs and nouns that occurred
fewer than 10 times, resulting in a vocabulary of
4,456 verbs and 17,345 nouns. The resulting dataset
consisted of 1,372,111 triples of verb, noun, and
syntactic relation. Of these, 90% were used as train-
ing material, 5% were used as a cross-validation set
for setting linear interpolation and deterministic an-
nealing parameters, and 5% were used as test data
for the results reported below.
4 The Models
We compare performance of a number of probabil-
ity models for our verb argument data in order to
explore the dependencies of the data and the impact
of clustering. Graphical representations of the clus-
tering models are shown in Figure 1.
Unigram Baseline: This model assumes complete
independence of the verb, syntactic slot, and
noun, and serves to provide a baseline for the
complexity of the task:
C8
BD
B4DABND7BND2B5BPC8B4DAB5C8B4D7B5C8B4D2B5
Bigram: This model predicts both the noun and
syntactic slot conditioned on the verb, but in-
dependently of one another:
C8
BE
B4DABND7BND2B5BPC8B4DAB5C8B4D7CYDAB5C8B4D2CYDAB5
Trigram: This is simply the empirical distribution
over triples of verb, slot, and noun:
C8
BF
B4DABND7BND2B5BPALC8B4DABND7BND2B5
Three-way Aspect: Following Hofmann and
Puzicha (1998), we refer to EM-based cluster-
ing as the aspect model, where different values
of the cluster variable are intended to represent
abstract “aspects” of the data. The simplest
version of the clustering model predicts verb,
slot, and noun independently given the cluster
variable CR:
C8
CR
B4DABND7BND2B5BPC8B4CRB5C8B4DACYCRB5C8B4D7CYCRB5C8B4D2CYCRB5
with all four component distributions being es-
timated by EM training.
Verb-Slot Aspect: This is the model of Rooth et al.
(1999), in which the verb and slot are com-
bined into one atomic variable before the as-
pect model is trained:
C8
CR
DAD7
BP C8B4CRB5C8B4DABND7CYCRB5C8B4D2CYCRB5
Noun-Slot Aspect: A variation on the above model
combines the slot with the noun, rather than the
verb:
C8
CR
D2D7
BP C8B4CRB5C8B4DACYCRB5C8B4D2BND7CYCRB5
Alternation: This model, described in more detail
above, introduces a new unobserved variable D6
for the semantic role of the noun, which can
take two values:
C8
CPD0D8
BP C8B4CRB5C8B4DACYCRB5C8B4D7CYCRB5C8B4D6CYD7BNCRB5C8B4D2CYD6BNCRB5
000000
000000
000000
000000
000000
111111
111111
111111
111111
111111
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
cluster
verb slot noun
00000
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
11111
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
cluster
verb,slot noun
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
verb
noun
slot
role
cluster
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
00000
00000
00000
00000
00000
11111
11111
11111
11111
11111
cluster
verb noun,slot
Alternation
Verb-Slot AspectThree-way Aspect
Noun-Slot Aspect
Figure 1: Graphical models: shading represents observed variables, arrows probabilistic dependencies.
Fixed Alternation: This model is designed to in-
corporate the assumption that the semantic
roles of the subject and object of the same verb
must be different. The independence assump-
tions are identical to those of the simple alter-
nation model:
C8
CPD0D8
BE
BP C8B4CRB5C8B4DACYCRB5C8B4D7CYCRB5C8B4D6CYD7BNCRB5C8B4D2CYD6BNCRB5
but the probability C8B4D6CYD7BNCRB5 is only trained for
D7 BP subj-intrans. The model is constrained to
assign one value of the role variable to direct
objects, C8B4D6 BP BCCYD7 BP objB5 BP BD and the other
role to subjects of transitives: C8B4D6 BP BDCYD7 BP
subj-transB5BPBD.
5 Results
Perplexity results on held-out test data for each of
the models are shown in Table 2. Because models
2, 3, 5, and 6 will assign zero probability to certain
pairs of values not seen in the training data, they
were combined with the unigram baseline model in
order to obtain a perplexity over the entire test set
comparable to the other models. This was done
using linear interpolation, with the interpolation
weight optimized on the cross-validation data. Per-
plexity is the geometric mean of the reciprocal of
the probability assigned by the model to each triple
of verb, noun, and slot in the test data:
C8C8 BP CT
A0
BD
C6
C8
CX
D0D3CV C8B4DA
CX
BND2
CX
BND7
CX
B5
For the single-variable clustering models (4, 5
and 6) 128 values were allowed for the cluster
variable CR. For the two-variable clustering mod-
els (7 and 8), 64 values for CR and 2 values for the
unobserved semantic roles variable D6 were used,
making for a total of 128 distributions over nouns
(C8B4D2CYD6BNCRB5) but only 64 over verbs (C8B4DACYCRB5). The to-
tal number of parameters for each model is shown in
Table 2. Because deterministic annealing was used
to smooth the probability distributions for each clus-
ter and prevent overfitting the training data, the per-
plexities obtained were relatively insensitive to the
number of clusters used.
Of the clustering models, the Verb-Slot Aspect
model did the best, with a perplexity of 2.31M. It is
perhaps surprising how close the Three-way Aspect
model came, with a perplexity of 2.41M, despite the
fact that it models the noun as being independent
of the syntactic position for a given verb. One ex-
planation for this is that nouns in fact occur in all
three positions more frequently than we would ex-
pect from traditional accounts of alternation behav-
ior. This is shown in our corpus examples of Table
1 by the high frequency of door as a subject of an
transitive use of open. Even in the traditional al-
ternation pattern where a noun occurs in two of the
three positions, the Three-way Aspect model may
do better at capturing this overlap, even though it
will mistakenly assign probability mass to the same
nouns appearing in the third syntactic position, than
do models 5 and 6, which are not able to generalize
Model Test Perplexity Total Parameters
1. Unigram Baseline 5.50M 20,651
2. Bigram 2.95M 57.64M
3. Trigram 2.55M 172.88M
4. Three-way Aspect 2.41M 2.64M
5. Verb-Slot Aspect 2.31M 3.47M
6. Noun-Slot Aspect 2.66M 6.56M
7. Alternation 2.57M 2.43M
8. Fixed Alternation 2.60M 2.43M
9. Trigram+Verb-Slot Aspect 2.06M 176.36M
Table 2: Comparison of probability models
at all across the different arguments of a given verb.
The models specifically designed to capture alter-
nation behavior (7 and 8) did not do as well as the
generic clustering models. One explanation is that
the unconstrained models are able to fit the data bet-
ter by clustering together specific arguments of dif-
ferent verbs even when the two verbs do not share
the same alternation behavior. Examining the clus-
ters found by the Verb-Slot Aspect shows that it in
fact seems to find alternation behavior for specific
verbs despite the model’s inability to explicitly rep-
resent alternation. In many cases, two roles of the
same verb are assigned to the same cluster. Exam-
ples of the top ten members of sample clusters are
shown in Table 3. Examining the sample verbs of
Table 1, we see that the model assigns the direct
object and subject of intransitive slots of open to
the same cluster, implicitly representing the verb’s
alternation behavior, and in fact does the same for
the semantically related verbs close and shut.Sim-
ilarly, the direct object and subject of intransitive
slots of increase are assigned to the same cluster.
However, in an example of how the model can clus-
ter semantically related verbs that do not share the
same alternation behavior, the direct object slot of
reduce and the subject of transitive slot of exceed
are groups together with increase. Of particular in-
terest is the verb play, for which the model assigns
one cluster to each of the alternation patterns noted
in Table 1. Cluster 18 represents the alternation be-
tween direct object and subject of intransitive seen
with part, game,andmusic, while cluster 92 rep-
resents the agent relation expressed by subjects of
both transitive and intransitive sentences.
The final line of Table 2 represents an interpola-
tion of the best D2-gram and best clustering model,
which further reduces perplexity to 2.06 million.
6Conclusion
We have attempted to learn the mapping from syn-
tactic position to semantic role in an unsupervised
manner, and have evaluated the results in terms of
our systems’ success as language model for unseen
data. The models designed to explicit represent verb
alternation behavior did not perform as well by this
metric as other, simpler probability models.
A perspective on this work can be gained by com-
parison with attempts at unsupervised learning of
other natural language phenomena including part-
of-speech tagging (Merialdo, 1994) and syntactic
dependencies (Carroll and Charniak, 1992; Paskin,
2001). While models trained using the Expectation
Maximization algorithm do well at fitting the data,
the results may not correspond to the human analy-
ses they were intended to learn. Language does not
exist in the abstract, but conveys information about
the world, and the ultimate goal of grammar induc-
tion is not just to model strings but to extract this
information. This suggests that although the proba-
bility models constrained to represent verb alterna-
tion behavior did not achieve the best perplexity re-
sults, they may be useful as part of an understanding
system which assigns semantic roles to arguments.
The implicit representation of alternation behavior
in our generic clustering model also suggests using
its clusters to initialize a more complex model capa-
ble of assigning semantic roles.
Acknowledgments This work was undertaken
with funding from the Institute for Research in Cog-
nitive Science at the University of Pennsylvania and
DoD Grant MDA904-00C-2136.
Cluster Id Verb-Slot Noun Cluster Id Verb-Slot Noun
57 door open-obj 18 part play-obj
mouth open-subj-intrans role form-obj
eyes close-obj lip take-obj
firebreak close-subj-intrans game bite-obj
gate shut-subj-intrans basis play-subj-intrans
shop slam-subj-intrans host lick-obj
window shut-obj parts curl-subj-intrans
way knock-obj music see-obj
exhibition reach-obj card constitute-obj
47 number increase-subj-intrans 92 people play-subj-intrans
amount require-obj man win-subj-intrans
supply reduce-obj child take-subj-trans
level increase-obj woman make-subj-trans
rate exceed-subj-trans one need-subj-trans
tooth need-obj the play-subj-trans
income include-obj band see-subj-trans
risk affect-obj group get-subj-intrans
activity show-obj team manage-subj-intrans
Table 3: Sample Clusters from Verb-Slot Aspect Model

References

Don Blaheta and Eugene Charniak. 2000. Assigning
function tags to parsed text. In Proceedings of the 1st
Annual Meeting of the North American Chapter of the
ACL (NAACL), pages 234–240, Seattle, Washington.

Glenn Carroll and Eugene Charniak. 1992. Two experi-
ments on learning probabilistic dependency grammars
from corpora. In Workshop Notes for Statistically-
Based NLP Techniques, pages 1–13. AAAI.

Eugene Charniak. 1997. Statistical parsing with a
context-free grammar and word statistics. In AAAI-
97, pages 598–603, Menlo Park, August. AAAI Press.

Ciprian Chelba and Frederick Jelinek. 1999. Recogni-
tion performance of a structured language model. In
EUROSPEECH.

Michael Collins. 1997. Three generative, lexicalised
models for statistical parsing. In Proceedings of the
35th ACL, pages 16–23, Madrid, Spain.

Christiane Fellbaum, editor. 1998. WordNet: An Elec-
tronic Lexical Database. MIT Press, Cambridge,
Massachusetts.

Thomas Hofmann and Jan Puzicha. 1998. Statistical
models for co-occurrence data. Memo, Massachus-
setts Institute of Technology Artificial Intelligence
Laboratory, February.

Beth Levin. 1993. English Verb Classes And Alter-
nations: A Preliminary Investigation. University of
Chicago Press, Chicago.

David Magerman. 1995. Statistical decision-tree models
for parsing. In Proceedings of the 33rd ACL,Cam-
bridge, Massachusetts.

Mitchell P. Marcus, Grace Kim, Mary Ann Marcin-
kiewicz, Robert MacIntyre, Ann Bies, Mark Fergu-
son, Karen Katz, and Britta Schasberger. 1994. The
Penn Treebank: Annotating predicate argument struc-
ture. In ARPA Human Language Technology Work-
shop, pages 114–119, Plainsboro, NJ. Morgan Kauf-
mann.

Diana McCarthy. 2000. Using semantic preferences to
identify verbal participation in role switching alterna-
tions. In Proceedings of the 1st NAACL, pages 256–
263, Seattle, Washington.

Bernard Merialdo. 1994. Tagging English text with
a probabilistic model. Computational Linguistics,
20(2):155–172.

Paola Merlo and Suzanne Stevenson. 2001. Auto-
matic verb classification based on statistical distribu-
tion of argument structure. Computational Linguis-
tics, 27(3), September.

Mark Paskin. 2001. Grammatical bigrams. In T. Di-
etterich, S. Becker, and Z. Gharahmani, editors, Ad-
vances in Neural Information Processing Systems
(NIPS) 14. MIT Press.

Fernando Pereira, Naftali Tishby, and Lillian Lee. 1993.
Distributional clustering of English words. In Pro-
ceedings of the 31st ACL, pages 183–190, Columbus,
Ohio. ACL.

Brian Roark. 2001. Probabilistic top-down parsing
and language modeling. Computational Linguistics,
27(2):249–276.

Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Car-
roll, and Franz Beil. 1999. Inducing a semantically
annotated lexicon via EM-based clustering. In Pro-
ceedings of the 37th Annual Meeting of the ACL, pages
104–111, College Park, Maryland.

Sabine Schulte im Walde. 2000. Clustering verbs se-
mantically according to their alternation behaviour. In
In Proceedings of the 18th International Conference
on Computational Linguistics (COLING-00), pages
747–753, Saarbr¨ucken, Germany.

XTAG Research Group. 2001. A lexicalized tree adjoin-
ing grammar for English. Technical Report IRCS-01-
03, IRCS, University of Pennsylvania.
