Proposition Bank II:  Delving Deeper 
 
 
Olga Babko-Malaya, Martha Palmer, Nianwen Xue, Aravind Joshi
1
, Seth Kulick 
University of Pennsylvania 
{malayao/mpalmer/xueniwen/joshi/skulick}@linc.cis.upenn.edu 
                                                           
1
 Associated with Penn Discourse Treebank (PDTB). Other members of the project are Eleni Miltsakaki, Rashmi Prasad, 
(Univ. of PA) and Bonnie Webber (Univ. of Edinburgh) 
 
 
Abstract 
The PropBank project is creating a corpus of 
text annotated with information about basic 
semantic propositions. PropBank I (Kingsbury 
& Palmer, 2002) added a layer of predicate-
argument information, or semantic roles, to 
the syntactic structures of the English Penn 
Treebank.   This paper presents an overview 
of the second phase of PropBank Annotation, 
PropBank II, which is being applied to Eng-
lish and Chinese, and includes (Neodavid-
sonian) eventuality variables, nominal 
references, sense tagging, and connections to 
the Penn Discourse Treebank (PDTB), a pro-
ject for annotating discourse connectives and 
their arguments. 
1 Introduction 
An important question is the degree to which current 
statistical NLP systems can be made more domain-
independent without prohibitive costs, either in terms of 
engineering or annotation.  The Proposition Bank is 
designed as a broad-coverage resource to facilitate the 
development of more general systems.  It focuses on the 
argument structure of verbs, and provides a complete 
corpus annotated with semantic roles, including partici-
pants traditionally viewed as arguments and ad-
juncts.  Correctly identifying the semantic roles of the 
sentence constituents is a crucial part of interpreting 
text, and in addition to forming a component of the in-
formation extraction problem, can serve as an interme-
diate step in machine translation or automatic 
summarization. 
 
The Proposition Bank project takes a practical approach 
to semantic representation, adding a layer of predicate-
argument information, or semantic roles, to the syntactic 
structures of the Penn Treebank.  The resulting resource 
can be thought of as shallow, in that it does not repre-
sent co-reference, quantification, and many other 
higher-order phenomena, but also broad, in that it cov-
ers every verb in the corpus and allows representative 
statistics to be calculated. The semantic annotation pro-
vided by PropBank is only a first approximation at cap-
turing the full richness of semantic representation. 
Additional annotation of nominalizations and other 
noun predicates has already begun at NYU. This paper 
presents an overview of the second phase of PropBank 
Annotation, PropBank II, which is being applied to Eng-
lish and Chinese and includes (Neodavidsonian) eventu-
ality variables, nominal references, sense tagging, and 
discourse connectives.   
2 PropBank I 
PropBank (Kingsbury & Palmer, 2002) is an annotation 
of the Wall Street Journal portion of the Penn Treebank 
II (Marcus, 1994) with `predicate-argument' structures, 
using sense tags for highly polysemous words and se-
mantic role labels for each argument. An important goal 
is to provide consistent semantic role labels across dif-
ferent syntactic realizations of the same verb, as in the 
window in [
ARG0
 John] broke [
ARG1
 the window] and 
[
ARG1
 The window] broke. PropBank can provide fre-
quency counts for (statistical) analysis or generation 
components in a machine translation system, but pro-
vides only a shallow semantic analysis in that the anno-
tation is close to the syntactic structure and each verb is 
its own predicate. 
 
In PropBank, semantic roles are defined on a verb-by-
verb basis.  An individual verb's semantic arguments are 
simply numbered, beginning with 0.  Polysemous verbs 
have several Framesets, corresponding to a relatively 
coarse notion of word senses, with a separate set of 
numbered roles, a roleset, defined for each Frameset.  
For instance, leave has both a DEPART Frameset ([
ARG0 
John] left [
ARG1
 the room]) and a GIVE Frameset, ([
ARG0
 
I] left [
ARG1
 my pearls] [
ARG2
 to my daughter-in-law] 
[
ARGM-LOC
 in my will].)   While most Framesets have 
three or four numbered roles, as many as six can appear, 
in particular for certain verbs of motion. Verbs can take 
any of a set of general, adjunct-like arguments 
(ARGMs), such as LOC (location), TMP (time), DIS 
(discourse connectives), PRP (purpose) or DIR (direc-
tion).  Negations (NEG) and modals (MOD) are also 
marked. 
 
The same annotation philosophy has been extended to 
the Penn Chinese Proposition Bank (Xue and Palmer, 
2003). The Chinese PropBank annotation is performed 
on a smaller (250k words) and yet growing corpus an-
notated with syntactic structures (Xue et al 2004). The 
same syntactic alternations that form the basis for the 
English PropBank annotation also exist in robust quanti-
ties in Chinese, even though it may not be the case that 
the same exact verbs (meaning verbs that are close 
translations of one another) have the exact same range 
of syntactic realization for Chinese and English.  For 
example, in (1), "xin-nian/New Year  zhao-dai-
hui/reception" plays the same role in (a) and (b), which 
is the event or activity held,  even though it occurs in 
different syntactic positions. Assigning the same argu-
ment label, Arg1, to both instances, captures this regu-
larity. It is worth noting that the predicate “ju-
xing/hold" does not have passive morphology in (1a), 
despite of what its English translation suggests. Like the 
English PropBank, the adjunct-like elements receive 
more general labels like TMP or LOC, as also illustrated 
in (1). The tag set for Chinese and English PropBanks 
are to a large extent similar and more details can be 
found in (Xue and Palmer, 2003). 
  
(1) a. [ARG1 xin-nian/New Year zhao-dai-
hui/reception] [ARGM-TMP jin-tian/today] [ARGM-
LOC zai/at diao-yu-tai/Diaoyutai guo-bin-guan/state 
guest house ju-xing/hold]  
"The New Year reception was held in Diaoyutai State 
Guest House today." 
 
          b. [ARG0 tang-jia-xuan/Tang Jiaxuan] [ARGM-
TMP jin-tian/today] [ARGM-LOC zai/at diao-yu-
tai/Diaoyutai guo-bin-guan/state guest house] ju-
xing/hold [arg1 xin-nian/New Year zhao-dai-
hui/reception] 
"Tang Jiaxuan was holding the New Year Reception in 
Diaoyutai State Guest House today." 
 
For polysemous verbs that take different sets of seman-
tic roles, we also distinguish different Framesets. (2) 
and (3) illustrate the different Framesets of "tong-
guo/pass", which correspond loosely with major senses 
of the verb.  The Frameset in (2) roughly means "pass 
by voting" while the Frameset illustrated by (3) means 
"pass through". The different Framesets are generally 
reflected in the different alternation patterns, which can 
serve as a cue for statistical systems performing Frame-
set disambiguation. (2) is similar to the causa-
tive/inchoative alternation (Levin, 1993). In contrast, (3) 
shows object drop. 
 
(2) a. [ARG0 guo-hui/Congress] zui-jin/recently tong-
guo/pass le/ASP [ARG1 zhou-ji/interstate yin-hang-
fa/banking law] 
    "The U.S. Congress recently passed the inter-state 
banking law." 
      b. [ARG1 zhou-ji/interstate yin-hang-fa/banking 
law] zui-jin/recently tong-guo/pass le/ASP 
       "The inter-state banking law passed recently." 
 
(3) a. [ARG0 huo-che/train] zheng-zai/now tong-
guo/pass [ARG1 sui-dao/tunnel] 
       "The train is passing through the tunne." 
        b. [ARG0 huo-che/train]  zheng-zai/now  tong-
guo/pass. 
       "The train is passing." 
 
There are also some notable differences between Chi-
nese PropBank and English PropBank. In general, the 
verbs in the Chinese PropBank are less polysemous, 
with the vast majority of the verbs having just one 
Frameset. On the other hand, the Chinese PropBank has 
more verbs (including static verbs which are generally 
translated into adjectives in English) normalized by the 
corpus size.  
3 Adding Event Variables to PropBank 
Event variables provide a rich analytical tool for analyz-
ing verb meaning. Positing that there is an event vari-
able allows for a straightforward representation of the 
logical form of adverbial modifiers, the capturing of 
pronominal reference to events, and the representation 
of nouns that refer to events. For example, event vari-
ables make it possible to have direct reference to an 
event with a noun phrase, as in (4a) destruction, and to 
refer back to an event with a pronoun (as illustrated in 
(4b) That): 
 
(4) a. The destruction of Pompeii happened in the 1
st
 
century. 
       b. Brutus stabbed Caesar. That was a pivotal event 
in history. 
 
PropBank I annotations can be translated straightfor-
wardly into logical representations with event variables, 
as illustrated in (5), with relations being defined as 
predicates of events, and Args and ArgMs representing 
relations between event variables and corresponding 
phrases.   
 
(5) a. Mr. Bush met him privately, in the White House,          
on Thursday. 
 
     b. PropBank annotation 
  Rel:  met                      
       Arg0: Mr. Bush  
       ArgM-MNR: privately 
       ArgM-LOC: in the White House 
       ArgM-TMP: on Thursday 
 
     c. Logical representation with an event variable  
 ∃e meeting(e) & Arg0(e, Mr. Bush) & Arg1(e, he) 
& MNR(e, privately) & LOC(e, ‘in the White 
House’) & TIME(e, ‘on Thursday’) 
 
As the representation in (5c) shows, we adopt Neo-
davidsonian analysis of events, which follows Parsons 
(1990) in treating arguments on a par with modifiers in 
the event structure. An alternative analysis is the origi-
nal Davidsonian analysis of events (Davidson 1967), 
where the arguments of the verb are analyzed as its 
logical arguments. 
  
Our choice of a Neodavidsonian representation is moti-
vated by its predictions with respect to obligatoriness of 
arguments. Under the Davidsonian approach, arguments 
are logical arguments of the verb and thus must be im-
plied by the meaning of the sentence, either explicitly or 
implicitly (i.e. existentially quantified). On the other 
hand, it has been a crucial assumption in PropBank that 
not all roles must necessarily be present in each sen-
tence. For example, the Frameset for the verb serve, 
shown in (6a) has three roles: Arg0, Arg1, and Arg2. 
Actual usages of the verb, on the other hand, do not 
require the presence of all three roles. For example, the 
sentence in (6b), as its PropBank annotation in (6c) 
shows, does not include Arg1. 
 
(6)  a.  serve.01 "act, work":            
Arg0:worker 
Arg1:job, project 
Arg2:employer 
 
     b.  Each new trading roadblock is likely to be beaten     
by institutions seeking better ways *trace* to serve 
their high-volume clients.  
 
c. Arg0:  *trace* -> institutions 
     REL:    serve 
     Arg2:   their high-volume clients 
 
As the representations in (7) illustrate, only the Neo-
davidsonian representation gives the correct interpreta-
tion of this sentence. 
 
(7) Davidsonian representation: 
     ∃e ∃z serve(e, institutions, z, their high-volume 
clients) 
 
Neodavidsonian representation: 
     ∃e serve(e)&Arg0(e, institutions)&Arg2(e, their 
high-volume clients) 
 
Assuming a Neodavidsonian representation, we can 
analyze all Args and certain types of modifiers as predi-
cates of events.  The types of ArgMs that can be ana-
lyzed as predicates of event variables are shown below: 
 
• MNR:   to manage businesses profitably 
• TMP:    to run the company for 23 years 
• LOC:    to use the notes on the test 
• DIR:     to jump up 
• CAU:    because of … 
• PRP:      in order to … 
 
Whereas for the most part, translating these adverbials 
into modifiers of event variables does not require man-
ual annotation, certain constructions need human revi-
sion. For example, in the sentence in (8a) the temporal 
ArgM ‘for the past five years’ does not modify the event 
variable e introduced by the verb manage, as our auto-
matic translation would predict. The revised analysis of 
this sentence, given in (8b), follows Krifka 1989, who 
proposed that negated sentences refer to maximal events 
– events that have everything that happened during their 
running time as a part. Annotation of this sentence 
would thus require us to introduce an additional event 
variable, the maximal event e’, which has a duration 
‘for the past five years’ and has no event of unions 
managing wage increases as part. 
 
(8)  a. For the past five years, unions have not managed 
to win wage increases. 
       b. ∃e’ TMP(e’, ‘for the past five years’) &  
∃¬e(e<e’ & managing(e) & Arg0(e, unions) & 
Arg1(e, ‘win wage increases’)) 
 
Further annotation involves linking empty categories in 
PropBank to event variables in cases of control, as illus-
trated in (9), where event variables can be viewed as the 
appropriate antecedents for PRO, marked as ‘*’ below: 
    
(9) The car collided with a lorry, * killing both drivers. 
 
And, finally, we will consider tagging variables accord-
ing to the aspectual class of the eventuality they denote, 
such as states or events. Events, such as John built a 
house, involve some kind of change and usually imply 
that some condition, which obtains when the event be-
gins, is terminated by the event. States, on the other 
hand, do not involve any change and hold for varying 
amounts of time. It does not make sense to ask how long 
a state took (as opposed to events), and whether the 
state is culminated or finished.  
 
This distinction between states and events plays an im-
portant role for the temporal analysis of discourse, as 
the following examples (from Kamp and Reyle 1993) 
illustrate: 
 
(10) a. A man entered the White Hart. Bill served him a 
beer. 
       b. I arrived at the Olivers’ cottage on Friday night. 
It was not a propitious beginning to my visit. She 
was ill and he in a foul mood. 
 
If a non-initial sentence denotes an event, then it is typi-
cally understood as following the event described by the 
preceding sentence. For example, in (10a), the event of 
Bill serving a beer is understood as taking place after 
the event of ‘a man entering the White Hart’ was com-
pleted.  On the other hand, states are interpreted as tem-
porally overlapping with the time of the preceding 
sentence, as illustrated in (10b). The sentences she was 
ill and he was in a foul mood seem to describe a state of 
affairs obtaining at the time of the speaker’s arrival.  
 
As this example illustrates, there are different types of 
temporal relations between eventualities (as we will call 
both events and states) and adverbials that modify them, 
such as temporal overlap and temporal containment. 
Furthermore, these relations crucially depend on the 
aspectual properties of the sentence. Translation of PB 
annotations to logical representations with eventuality 
variables and tagging these variables according to their 
aspectual type would thus make it possible to provide an 
analysis of temporal relations. This analysis should also 
be compatible with a higher level of annotation of tem-
poral structure (e.g. Ferro et al, 2001).   
4 Annotation of Nominal Coreference 
Our approach to coreference annotation is based on the 
recognition of the different types of relationships that 
might be called "coreference".  The most straightfor-
ward case is that of two semantically definite NPs that 
refer to identical entities, as in (11).  Anaphoric rela-
tions (very broadly defined) are those in which one NP 
(or possessive adjective) has no referential value of its 
own but depends on an antecedent for its interpretation. 
In some cases this can be relatively simple, as in (12), in 
which the pronoun He takes John Smith as its antece-
dent.  However, in some cases, as in (13), the antecedent 
may not even be a referring expression, or can, as in 
(14), refer to an entity that may or may not exist, with 
the non-existent a car being the antecedent of it.  The 
anaphor does not have to be an NP, as in (15), in which 
the possessive their, which takes many companies as its 
antecedent, is an adjective. 
 
(11) John Smith of Company X arrived yesterday.  Mr. 
Smith said that..." 
(12) John Smith of Company X arrived yesterday.  He 
said that..." 
(13) No team spoke about its system. 
(14) I want to buy a car.  I need it to go to work. 
(15) Many companies raised their payouts by more than 
10%. 
 
Another level of complexity is raised by NPs that are 
not anaphors, in that they have their own reference (per-
haps abstract or nonexistent), but are not in an identity 
relationship with an antecedent, but rather describe a 
property of that antecedent.  Typical cases of this are 
predicate nominals, as in (16), or appositives, as in (17), 
and other cases as in (18). 
 
(16) Larry is a university lecturer. 
(17) Larry, the chair of his department, became presi-
dent. 
(18) The stock price fell from $4.02 to $3.85 
 
As has been discussed (e.g., van Deemter & Kibble, 
2001), such cases have fundamentally different proper-
ties than either the identity relationships of (11) or the 
anaphoric relationships of (12)-(15).   
 
Annotation of nominal co-reference is being done in 
two passes. The first pass involves annotation of true 
co-reference between semantically definite NPs`. The 
issue here is to consider what the semantically definite 
nouns are.  Initially, they are defined as proper nouns 
(named entities), either as NPs (America) or prenominal 
adjectives (American politicians).   
 
(19) The last time the S&P 500 yield dropped below 3% 
was in the summer of 1987... There have been only 
seven other times when the yield on the S&P 500 
dropped....   
 
It is reasonable to expand this to definite descriptions, 
so that in (19), the S&P 500 yield and the yield on the 
S&P 500 are marked as coreferring.  However, some 
definite NPs can refer to clauses, not NPs, such as The 
pattern in (20), and we will not do such cases of clausal 
antecedents on the first pass. 
 
(20) The index fell 40% in 1975 and jumped 80% in 
1976.  The pattern is an unusual one. 
 
Anaphoric relations are being done on a "need-to-
annotate" basis.   For each anaphoric NP or possessive 
adjective, the annotator needs to determine its antece-
dent.  As discussed, this is a different type of relation 
than identity, and this distinction will be noted in the 
annotation. The issue here is what we consider an ana-
phoric element to be.  We consider all cases of pro-
nouns, possessives, reflexives, and NPs with that/those 
to be potential cases of anaphors (again, broadly de-
fined). However, as with definite NPs, we only mark 
those that have an NP antecedent, and not clausal ante-
cedents.  For example, in (21), it refers to the current 
3.3% reading, and so would be marked as being in an 
antecedent-anaphor relation. In (22), it refers to having 
the dividend increases, which is not an NP, and so 
would not be marked as being in an anaphor relation in 
the first pass. Similar considerations apply to potential 
anaphors like those NP, that NP, etc. 
 
 (21) ...the current 3.3% reading isn't as troublesome as 
it might have been. 
(22) Having the dividend increases is a supportive ele-
ment in the market outlook, but I don't think it's a 
main consideration". 
 
Note that placing the burden on the anaphors to deter-
mine what gets marked as being in an anaphor-
antecedent leaves it open as to what the antecedent 
might be, other than the requirement just mentioned of it 
being an NP.  Not only might it be non-referring NPs as 
in  (13) or (14), it could even be a generic, as in (23), in 
which books is the antecedent for they. 
 
(23) I like books.  They make me smile. 
 
The second pass will tackle the more difficult issues: 
 
1. Descriptive NPs, as in (16)-(18).  While the informa-
tion provided by these cases would be extremely valu-
able for information extraction and other systems, there 
are some uncertain issues here, mostly focusing on how 
such descriptors describe the antecedent at different 
moments in time and/or space.  The crucial question is 
therefore what to take the descriptor to be.   
 
(24) Henry Higgins might become the president of 
Dreamy Detergents. 
 
For example, in (18), it can't be just $4.02 and $3.85, 
since this does not include information about *when* 
the stock price had such values. The same issue arises 
for (17).  As van Deemter & Kibble point out, such 
cases can interact with issues of modality in uncertain 
ways, as illustrated in (24).  Just saying that in (24) the 
president of Dreamy Detergents is in the same type of 
relationship with Henry Higgins as a university lecturer 
is with Larry in (16) would be very misleading. 
 
2. Clausal antecedents - Here we will handle cases of it 
and other anaphor elements and definite NPs referring 
to non-NPs as antecedents, as in (21).  This will most 
likely be done by referring to the eventuality variable 
associated with the antecedent. 
5 Linking to the Penn Discourse Treebank 
(PDTB)  
The Penn Discourse Treebank (PDTB) is currently be-
ing built by the PDTB team at the University of Penn-
sylvania, providing the next appropriate level of 
annotation: the annotation of the predicate argument 
structure of connectives (Miltsakaki et al 2004a/b). The 
PDTB project is based on the idea that discourse con-
nectives can be thought of as predicates with their asso-
ciated argument structure. This perspective of discourse 
is based on a series of papers extending lexicalized tree-
adjoining grammar (LTAG) to discourse (DLTAG), 
beginning with Webber and Joshi (1998).
2
  This level of 
annotation is quite complex for a variety of reasons, 
such as the lack of available literature describing dis-
course connectives and frequent occurrences of empty 
(lexically null) connectives between two sentences that 
cannot be ignored. Also, unlike the predicates at the 
sentence level, some of the discourse connectives, espe-
cially discourse adverbials, take their arguments ana-
phorically and not structurally, requiring an intimate 
association with event variable representation.  
 
The long-range goal of the PDTB project is to develop a 
large scale and reliably annotated corpus that will en-
code coherence relations associated with discourse con-
nectives, including their argument structure and 
anaphoric links, thus exposing a clearly defined level of 
discourse structure and supporting the extraction of a 
range of inferences associated with discourse connec-
tives. This annotation will reference the Penn Treebank 
(PTB) annotations as well as PropBank. 
 
In PDTB, a variety of connectives are considered, such 
as subordinate and coordinate conjunctions, adverbial 
connectives and implicit connectives amounting to a 
total of approximately 20,000 annotations; 10,000 im-
                                                           
2
 The PDTB annotations are deliberately kept independ-
ent of DLTAG framework for two reasons: (1) to make the 
annotated corpus widely useful to researchers working in 
different frameworks and (2) to make the annotation task 
easier, thereby increasing interannotator reliability. 
plicit connectives and 10,000 annotations of the 250 
explicit connectives identified in the corpus (for details 
see (Miltsakaki et al 2004a and Miltsakaki et al 2004b).  
Current annotations in PDTB are performed by four 
annotators. Individual annotation proceeds one connec-
tive at a time. This way, the annotators quickly gain 
experience with that connective and develop a better 
understanding of its predicate-argument characteristics. 
For the annotation of implicit connectives, the annota-
tors are required to provide an explicit connective that 
best expressed the inferred relation. 
 
The PDTB is expected to be released by November 
2005. The final version of the corpus  will also contain 
characterizations of the semantic roles associated with 
the arguments of each type of connective as well as 
links to PropBank. 
6.   Annotation of Word Senses 
The critical question with respect to sense tagging in-
volves the choice of senses.  In other words, which 
sense inventory, and which level of granularity with 
respect to that sense inventory?  The PropBank Frames 
Files for the verbs include coarse-grained sense distinc-
tions based primarily on usages of a verb that have dif-
ferent numbers of predicate-arguments. These are 
termed Framesets – referring to the set of roles for each 
one and the corresponding set of syntactic frames.  We 
are currently sense-tagging the annotated predicates for 
lemmas with multiple Framesets, which can be done 
quickly and accurately with an inter-annotator agree-
ment of over 90%.  The distinctions made by the 
Framesets are very coarse, and each one would map to 
several standard dictionary entries for the lemma in 
question.  More fine-grained sense distinctions could be 
useful for Automatic Content Extraction, yet it remains 
to be determined exactly which distinctions are neces-
sary and what methodology should be followed to pro-
vide additional word sense annotation. 
 
Palmer et al (2004b) present an hierarchical approach to 
verb senses, where different levels of sense distinctions, 
from PropBank Framesets to WordNet senses, form a 
continuum of granularity. At the intermediate level of 
sense hierarchy we are considering manual groupings of 
the SENSEVAL-2 verb senses (Palmer, et.al., 2004a), 
developed in a separate project. Given a large disagree-
ment rate between annotators (average inter-annotator 
agreement rate for Senseval-2 verbs was only 71%), 
verbs were grouped by two or more people into sets of 
closely related senses, with grouping differences being 
reconciled, and the sense groups were used for coarse-
grained scoring of the systems. These groupings of 
WordNet senses were shown to reconcile a substantial 
portion of the manual and automatic tagging disagree-
ments, showing that many of these disagreements are 
fairly subtle.  Using the groups as a more coarse-grained 
set of sense distinctions improved ITA and system 
scores by almost 10%, to 82% and 69%, respectively 
(Palmer, et. al. 2004a). 
 
We have been investigating whether or not the groups 
can provide an intermediate level of hierarchy in be-
tween the PropBank Framesets and the WN 1.7 senses.  
Based on our existing WN 1.7 tags and Frameset tags of 
the Senseval2 verbs in the Penn Treebank, 95% of the 
verb instances map directly from sense groups to 
Framesets, with each Frameset typically corresponding 
to two or more sense groups. Using the PropBank 
coarse-grained senses as a starting place, and WordNet 
sense tagging for over 1000 verbs produced automati-
cally through mapping VerbNet to PropBank (Kipper, 
et. al., 2004), we have the makings of a large scale tag-
ging experiment on the Penn Treebank.  This will en-
able investigations into the applicability of clearly 
defined criteria for sense distinctions at varying levels 
of granularity, and produce a large, 1M word corpus of 
sense-tagged text for training WSD systems 
 
The hierarchical approach to verb senses, as utilized by 
most standard dictionaries as well as Hector (Atkins, 
’93), and as applied to SENSEVAL-2, presents obvious 
advantages for the problem of Word Sense Disambigua-
tion. The human annotation task is simplified, since 
there are fewer choices at each level and clearer distinc-
tions between them.  The automated systems can com-
bine training data from closely related senses to 
overcome the sparse data problem, and both humans 
and systems can back off to a more coarse-grained 
choice when fine-grained choices prove too difficult.  
 
Conclusion 
This paper has presented an overview of the second 
phase of PropBank Annotation, PropBank II, which is 
being applied to English and Chinese. It  includes (Neo-
davidsonian) eventuality variables, nominal references, 
an hierarchical approach to sense tagging, and connec-
tions to the Penn Discourse Treebank (PDTB), a project 
for annotating discourse connectives and their argu-
ments. 

References 
 
Atkins, S. (1993) Tools for computer-aided corpus lexi-
cography: The Hector Project.  Actu Linguistica Hunguricu, 
41:5-72. 
 
Carlson, L., Marcu, D. and Okurowski, M. E. (2002). 
Building a Discourse-Tagged Corpus in the Framework of 
Rhetorical Structure Theory. In Current Directions in Dis-
course and Dialogue, Jan van Kuppevelt and Ronnie Smith 
eds., Kluwer Academic Publishers. To appear. 
 
Davidson, D. 1967.  The Logical Form of Action Sen-
tences. In The Logic of Decision and Action, ed. Nicholas 
Rescher.  81--95. Pittsburgh: University of Pittsburgh 
Press.  Republished in Donald Davidson, Essays on Actions 
and Events,Oxford University Press, Oxford, 1980. 
 
Edmonds, P. and Cotton, S. 2001. SENSEVAL-2: Over-
view. In Proceedings of SENSEVAL-2: Second International 
Workshop on Evaluating Word Sense Disambiguation Sys-
tems, ACL-SIGLEX, Toulouse, France. 
 
Ferro L, I. Mani, B. Sundheim and G.Wilson 2001 TIDES 
Temporal Annotation Guidelines, MITRE Technical Report, 
MTR 01W0000041.  
 
Kamp, H. and U.Reyle. 1993. From Discourse to Logic, 
Kluwer, Dordrecht. 
 
Kingsbury, P. and Palmer, M, (2002), From TreeBank to 
PropBank, Third International Conference on Language  Re-
sources and Evaluation, LREC-02, Las Palmas, Canary Is-
lands, Spain, May 28- June 3. 
 
Kilgarriff, A. and Palmer, M.. 2000. Introduction to the 
special issue on Senseval, Computers and the Humanities, 
34(1-2):1-13. 
 
Kipper K., B. Snyder, and M. Palmer. (to appear, 2004) 
"Extending a verb-lexicon using a semantically annotated 
corpus". Proceedings of the 4th International Conference on 
Language Resources and Evaluation (LREC-04). Lisbon, 
Portugal, 2004. 
 
Krifka, M. 1989. Nominalreferenz und Zeitkonstitution. 
München, Wilhelm Fink Verlag 
 
Levin, B. 1993. English Verb Classes and Alternations: a 
Preliminary Investigation. Chicago: The University of Chi-
cago Press. 
 
Mann, W. and S. Thompson. 1986. “Relational Proposi-
tions in Discourse”, Discourse Processes 9, 57-90. 
Marcu, D. 2000. The Theory and Practice of Discourse 
Parsing and Summarization. The MIT Press. 
Miltsakaki, E., R. Prasad, A. Joshi and B. Webber. 2004a. 
The Penn Discourse Treebank. In Proceedings of the 4th In-
ternational Conference on Language Resources and Evalua-
tion (LREC 2004), Lisbon. 
 
Miltsakaki, E., R. Prasad, A. Joshi and B. Webber. 2004b. 
Annotation of Discourse Connectives and Their Arguments, in 
Proceedings of the HLT-EACL Workshop on Frontiers in 
Corpus Annotation, Boston, Massachussetts. 
Palmer, M., Dang, H. T, and Fellbaum, C., 2004a.  Making 
fine-grained and coarse-grained sense distinctions, both manu-
ally and automatically, under revision for Natural Language 
Engineering. 
Palmer, M., Babko-Malaya, O., Dang, H. T., 2004b. Dif-
ferent Sense Granularities for Different Applications, to ap-
pear in the Scalable Natural Language Understanding 
Workshop, held in conjunction with HLT/NAACL-04, May, 
2004. 
 
Parsons, T. 1990.  Events in the Semantics of Eng-
lish.  Cambridge, MA: MIT Press. 
  
van Deemter, K. and R. Kibble. 2000. “On Coreferring: 
Coreference in MUC and Related Annotation Schemes”, 
Computational Linguistics 26:629-637. 
 
Webber B. and A. Joshi. 1998. Anchoring a lexicalized 
tree-adjoining grammar for discourse. In ACL/COLING 
Workshop on Discourse Relations and Discourse Markers, 
Montreal, Canada, pp. 41-48.  
 
Xue, N. and Palmer, M. 2003. Annotating the Propositions 
in the Penn Chinese Treebank. In the Proceedings of the Sec-
ond SIGHAN Workshop on Chinese Language Processing.  
Sapporo, Japan. 
  
Xue, Nianwen, Xia, Fei, Chiou, Fu-dong and Palmer, 
Martha. 2004. The Penn Chinese Treebank: phrase structure 
annotation of a large corpus. Natural Language Engineering, 
10(4):1-30, June 2004.  
