Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pages 61–67,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
A Parallel Proposition Bank II for Chinese and English∗
Martha Palmer, Nianwen Xue, Olga Babko-Malaya, Jinying Chen, Benjamin Snyder
Department of Computer and Information Science
University of Pennsylvania
{mpalmer/xueniwen/malayao/Jinying/bsnyder3}@linc.cis.upenn.edu
Abstract
The Proposition Bank (PropBank) project
is aimed at creating a corpus of text an-
notated with information about seman-
tic propositions. The second phase of
the project, PropBank II adds additional
levels of semantic annotation which in-
clude eventuality variables, co-reference,
coarse-grained sense tags, and discourse
connectives. This paper presents the re-
sults of the parallel PropBank II project,
which adds these richer layers of semantic
annotation to the first 100K of the Chinese
Treebank and its English translation. Our
preliminary analysis supports the hypoth-
esis that this additional annotation recon-
ciles many of the surface differences be-
tween the two languages.
1 Introduction
There is a pressing need for a consensus on a task-
oriented level of semantic representation that can en-
able the development of powerful new semantic ana-
lyzers in the same way that the Penn Treebank (Mar-
cus et al., 1993) enabled the development of sta-
tistical syntactic parsers (Collins, 1999; Charniak,
2001). We believe that shallow semantics expressed
as a dependency structure, i.e., predicate-argument
structure, for verbs, participial modifiers, and nom-
inalizations provides a feasible level of annotation
that would be of great benefit. This annotation, cou-
pled with word senses, minimal co-reference links,
∗This work is funded by the NSF via Grant EIA02-05448 .
event identifiers, and discourse and temporal rela-
tions, could provide the foundation for a major ad-
vance in our ability to automatically extract salient
relationships from text. This will in turn facilitate
breakthroughs in message understanding, machine
translation, fact retrieval, and information retrieval.
The Proposition Bank project is a major step towards
providing this type of annotation. It takes a prac-
tical approach to semantic representation, adding a
layer of predicate argument information, or seman-
tic roles, to the syntactic structures of the Penn Tree-
bank (Palmer et al., 2005). The Frame Files that
provide guidance to the annotators constitute a rich
English lexicon with explicit ties between syntac-
tic realizations and coarse-grained senses, Frame-
sets. PropBank Framesets are distinguished primar-
ily by syntactic criteria such as differences in sub-
categorization frames, and can be seen as the top-
level of an hierarchy of sense distinctions. Group-
ings of fine-grained WordNet senses, such as those
developed for Senseval2 (Palmer et al., to appear)
provide an intermediate level, where groups are dis-
tinguished by either syntactic or semantic criteria.
WordNet senses constitute the bottom level. The
PropBank Frameset distinctions, which can be made
consistently by humans and systems (over 90% ac-
curacy for both), are surprisingly compatible with
the groupings; 95% of the groups map directly onto
a single PropBank frameset sense (Palmer et al.,
2004).
The semantic annotation provided by PropBank
is only a first approximation at capturing the full
richness of semantic representation. Additional an-
notation of nominalizations and other noun pred-
61
icates has already begun at NYU. This paper de-
scribes the results of PropBank II, a project to pro-
vide richer semantic annotation to structures that
have already been propbanked, specifically, eventu-
ality IDcjkA1AFs, coreference, coarse-grained sense tags,
and discourse connectives. Of special interest to the
machine translation community is our finding, pre-
sented in this paper, that PropBank II annotation rec-
onciles many of the surface differences of the two
languages.
2 PropBank I
PropBank (Palmer et al., 2005) is an annotation of
the Wall Street Journal portion of the Penn Treebank
II (Marcus et al., 1994) with ‘predicate-argument’
structures, using sense tags for highly polysemous
words and semantic role labels for each argument.
An important goal is to provide consistent seman-
tic role labels across different syntactic realizations
of the same verb, as in the window in [ARG0 John]
broke [ARG1 the window] and [ARG1 The window]
broke. PropBank can provide frequency counts for
(statistical) analysis or generation components in
a machine translation system, but provides only a
shallow semantic analysis in that the annotation is
close to the syntactic structure and each verb is its
own predicate.
In PropBank, semantic roles are defined on a
verb-by-verb basis. An individual verb’s seman-
tic arguments are simply numbered, beginning with
0. Polysemous verbs have several framesets, cor-
responding to a relatively coarse notion of word
senses, with a separate set of numbered roles, a role-
set, defined for each Frameset. For instance, leave
has both a DEPART Frameset ([ARG0 John] left
[ARG1 the room]) and a GIVE Frameset, ([ARG0
I] left [ARG1 my pearls] [ARG2 to my daughter-in-
law] [ARGM-LOC in my will].) While most Frame-
sets have three or four numbered roles, as many
as six can appear, in particular for certain verbs of
motion. Verbs can take any of a set of general,
adjunct-like arguments (ARGMs), such as LOC (lo-
cation), TMP (time), DIS (discourse connectives),
PRP (purpose) or DIR (direction). Negations (NEG)
and modals (MOD) are also marked.
There are several other annotation projects,
FrameNet (Baker et al., 1998), Salsa (Ellsworth et
al., 2004), and the Prague Tectogrammatics (Haji-
cova and Kucerova, 2002), that share similar goals.
BerkeleycjkA1AFs FrameNet project, (Baker et al., 1998;
Fillmore and Atkins, 1998; Johnson et al., 2002)
is committed to producing rich semantic frames on
which the annotation is based, but it is less con-
cerned with annotating complete texts, concentrat-
ing instead on annotating a set of examples for each
predicator (including verbs, nouns and adjectives),
and attempting to describe the network of relations
among the semantic frames. For instance, the buyer
of a buy event and the seller of a sell event would
both be Arg0cjkA1AFs (Agents) in PropBank, while in
FrameNet one is the BUYER and the other is the
SELLER. The Salsa project (Ellsworth et al., 2004)
in Germany is producing a German lexicon based
on the FrameNet semantic frames and annotating a
large German newswire corpus. PropBank style an-
notation is being used for verbs which do not yet
have FrameNet frames defined.
The PropBank annotation philosophy has been
extended to the Penn Chinese Proposition Bank
(Xue and Palmer, 2003). The Chinese PropBank an-
notation is performed on a smaller (250k words) and
yet growing corpus annotated with syntactic struc-
tures (Xue et al., To appear). The same syntac-
tic alternations that form the basis for the English
PropBank annotation also exist in robust quantities
in Chinese, even though it may not be the case that
the same exact verbs (meaning verbs that are close
translations of one another) have the exact same
range of syntactic realization for Chinese and En-
glish. For example, in (1), ”cjkD0C2cjkC4EA/New YearcjkD5D0cjkB4FDcjkBBE1/
reception” plays the same role in (a) and (b), which
is the event or activity held, even though it occurs in
different syntactic positions. Assigning the same ar-
gument label, Arg1, to both instances, captures this
regularity. It is worth noting that the predicate cjkA1B0cjkBED9
cjkD0D0/hold” does not have passive morphology in (1a),
despite what its English translation suggests. Like
the English PropBank, the adjunct-like elements re-
ceive more general labels like TMP or LOC, as also
illustrated in (1). The functional tags for Chinese
and English PropBanks are to a large extent similar
and more details can be found in (Xue and Palmer,
2003).
(1) a. [ARG1 cjkD0C2cjkC4EA/New Year cjkD5D0cjkB4FDcjkBBE1/reception] [ARGM-
TMP cjkBDF1 cjkCCEC/today] [ARGM-LOC cjkD4DA/at cjkB5F6 cjkD3E3
62
cjkCCA8/DiaoyutaicjkB9FAcjkB1F6cjkB9DD/state guest house cjkBED9cjkD0D0/hold]
”The New Year reception was held in Diao-yutai
State Guest House today.”
b. [ARG0 cjkCCC6cjkBCD2cjkD0FD/Tang Jiaxuan] [ARGM-TMP cjkBDF1
cjkCCEC/today] [ARGM-LOC cjkD4DA/at cjkB5F6cjkD3E3cjkCCA8/Diaoyutai cjkB9FA
cjkB1F6cjkB9DD/state guest house] cjkBED9cjkD0D0/ hold [arg1 cjkD0C2cjkC4EA/New
YearcjkD5D0cjkB4FDcjkBBE1/reception]
”Tang Jiaxuan was holding the New Year reception in
Diaoyutai State Guest House today.”
3 A Parallel PropBank II
As discussed above, PropBank II adds richer se-
mantic annotation to the PropBank I predicate ar-
gument structures, notably eventuality variables,
co-references, coarse-grained sense tags (Babko-
Malaya et al., 2004; Babko-Malaya and Palmer,
2005), and discourse connectives (Xue, To appear)
To create our parallel PropBank II, we began with
the first 100K words of the Chinese Treebank which
had already been propbanked, and which we had
had translated into English. The English transla-
tion was first treebanked and then propbanked, and
we are now in the process of adding the PropBank
II annotation to both the English and the Chinese
propbanks. We will discuss our progress on each of
the three individual components of PropBank II in
turn, bringing out translation issues along the way
that have been highlighted by the additional anno-
tation. In general we find that this level of abstrac-
tion facilitates the alignment of the source and tar-
get language descriptions: event IDcjkA1AFs and event
coreferences simplify the mappings between verbal
and nominal events; English coarse-grained sense
tags correspond to unique Chinese lemmas; and dis-
course connectives correspond well.
3.1 Eventuality variables
Positing eventuality1 variables provides a straight-
forward way to represent the semantics of adver-
bial modifiers of events and capture nominal and
pronominal references to events. Given that the ar-
guments and adjuncts for the verbs are already an-
notated in Propbank I, adding eventuality variables
is for the most part straightforward. The example
in (2) illustrates a Propbank I annotation, which is
identified with a unique event id in Propbank II.
1The term ’eventuality’ is used here to refer to events and
states.
(2) a. Mr. Bush met him privately in the White House on
Thursday.
b. Propbank I: Rel: met, Arg0: Mr. Bush, Arg1: him,
ArgM-MNR: privately, ArgM-LOC: in the White
House, ArgM-TMP: on Thursday.
c. Propbank II: ∃e meeting(e) & Arg0(e,Mr. Bush) &
Arg1(e, him) & MNR (e, privately) & LOC(e, in the
White House) & TMP (e, on Thursday).
Annotation of event variables starts by auto-
matically associating all Propbank I annotations
with potential event ids. Since not all annotations
actually denote eventualities, we manually filter
out selected classes of verbs. We further attempt
to identify all nouns and nominals which describe
eventualities as well as all sentential arguments of
the verbs which refer to events. And, finally, part
of the PropBank II annotation involves tagging of
event coreference for pronouns as well as empty
categories. All these tasks are discussed in more
detail below.
Identifying event modifiers. The actual annota-
tion starts from the presumption that all verbs are
events or states and nouns are not. All the verbs in
the corpus are automatically assigned a unique event
identifier and the manual part of the task becomes (i)
identification of verbs or verb senses that do not de-
note eventualities, (ii) identification of nouns that do
denote events. For example, in (3), begin is an as-
pectual verb that does not introduce an event vari-
able, but rather modifies the verb cjkA1AEtakecjkA1AF, as is
supported by the fact that it is translated as an ad-
verb ”cjkB3F5/initially” in the corresponding Chinese sen-
tence.
(3) cjkD6D8cjkB5E3/key cjkB7A2cjkD5B9/develop cjkB5C4/DE cjkD2BDcjkD2A9/medicine cjkD3EB/and cjkC9FA
cjkCEEF/biology cjkBCBCcjkCAF5/technology, cjkD0C2/new cjkBCBCcjkCAF5/technology,
cjkD0C2/new cjkB2C4cjkC1CF/material, cjkBCC6cjkCBE3cjkBBFA/computer cjkBCB0/and cjkD3A6
cjkD3C3/application, cjkB9E2/photo cjkB5E7/electric cjkD2BBcjkCCE5cjkBBAF/integration
cjkB5C8/etc. cjkB2FAcjkD2B5/industry cjkD2D1/already cjkB3F5/initially cjkBEDF/take cjkB9E6
cjkC4A3/shape.
cjkA1B0Key developments in industries such as medicine,
biotechnology, new materials, computer and its applica-
tions, protoelectric integration, etc. have begun to take
shape.cjkA1B1
Nominalizations as events Although most nouns
do not introduce eventualities, some do and these
nouns are generally nominalizations2 . This is true
2The problem of identifying nouns which denote events is
addressed as part of the sense-tagging tagging. Detailed discus-
sion can be found in (Babko-Malaya and Palmer, 2005).
63
for both English and Chinese, as is illustrated in (4).
BothcjkA1B0cjkB7A2cjkD5B9/developcjkA1B1andcjkA1B0cjkC9EEcjkC8EB/deepeningcjkA1B1are
nominalized verbs that denote events. Having a par-
allel propbank annotated with event variables allows
us to see how events are lined up in the two lan-
guages and how their lexical realizations can vary.
The nominalized verbs in Chinese can be translated
into verbs or their nominalizations, as is shown in
the alternative translations of the Chinese original
in (4). What makes this particular example even
more interesting is the fact that the adjective mod-
ifier of the events, cjkA1B0cjkB2BBcjkB6CF/continuedcjkA1B1, can ac-
tually be realized as an aspectual verb in English.
The semantic representations of the Propbank II an-
notation, however, are preserved: both the aspec-
tual verb cjkA1B0continuecjkA1B1in English and the adjective
cjkA1B0cjkB2BBcjkB6CF/continuedcjkA1B1in Chinese are modifiers of the
events denoted by cjkA1B0cjkB7A2cjkD5B9/developmentcjkA1B1and cjkA1B0cjkC9EE
cjkC8EB/deepeningcjkA1B1.
(4) cjkCBE6cjkD7C5/with cjkD6D0cjkB9FA/China cjkBEADcjkBCC3/economy cjkB5C4/DE cjkB2BB
cjkB6CF/continuedcjkB7A2cjkD5B9/development cjkBACD/andcjkB6D4/tocjkCDE2/outside
cjkBFAAcjkB7C5/opencjkB5C4/DEcjkB2BBcjkB6CF/continuedcjkC9EEcjkC8EB/deepen cjkA1AD
cjkA1B0As ChinacjkA1AFs economy continues to develop and
its practice of opening to the outside continues to
deepencjkA1ADcjkA1B1
cjkA1B0With the continued development of ChinacjkA1AFs economy
and the continued deepening of its practice of opening to
the outsidecjkA1ADcjkA1B1
Event Coreference Another aspect of the event
variable annotation involves identifying pronominal
expressions that corefer with events. These pronom-
inal expressions may be overt, as in the Chinese ex-
ample in (5), while others correspond to null pro-
nouns, marked as pro3. in the Treebank annotations,
as in (6):
(5) cjkB6F8cjkC7D2/additionally, cjkB3F6cjkBFDA/export cjkC9CCcjkC6B7/commodity cjkBDE1
cjkB9B9/structure cjkBCCCcjkD0F8/continue cjkD3C5cjkBBAF/optimize, cjkC8A5cjkC4EA/last
year cjkB9A4 cjkD2B5/industry cjkD6C6 cjkB3C9 cjkC6B7/finished product cjkB3F6
cjkBFDA/export cjkB6EE/quota cjkD5BC/account for cjkC8ABcjkB9FA/entire country
cjkB3F6cjkBFDA/export cjkD7DCcjkB6EE/quantity cjkB5C4/DE cjkB1C8cjkD6D8/proportion
cjkB4EF/reach cjkB0D9cjkB7D6cjkD6AEcjkB0CBcjkCAAEcjkCEE5cjkB5E3cjkC1F9/85.6 percent, cjkD5E2/this cjkB3E4
cjkB7D6/clearly cjkB1EDcjkC3F7/indicate cjkD6D0cjkB9FA/China cjkB9A4cjkD2B5/industry cjkB2FA
cjkC6B7/productcjkB5C4/DEcjkD6C6cjkD4EC/produce cjkCBAEcjkC6BD/levelcjkB1C8/compared
with cjkB9FDcjkC8A5/past cjkD3D0/have cjkC1CB/LE cjkBADC/very cjkB4F3/big cjkCCE1
cjkB8DF/improvement.
cjkA1B0Moreover, the structure of export com-modities
continues to optimize, and last yearcjkA1AFs export volume
of manufactured products ac-counts for 85.6 percent of
3The small *pro* and big *PRO* distinction made in the
Chinese Treebank is exploratory in nature. The idea is that it is
easier to erase this distinction if it turns out to be implausible or
infeasible than to add it if it turns out to be important.
the whole countriescjkA1AFexport, *pro* clearly indicating
that ChinacjkA1AFs industrial product manufacturing level has
improved.cjkA1B1
(6) cjkD5E2cjkD0A9/these cjkB3C9cjkB9FB/achievement cjkD6D0/among cjkD3D0/have cjkD2BB
cjkB0D9cjkC8FDcjkCAAEcjkB0CB/138 cjkCFEE/item cjkB1BB/BEI cjkC6F3cjkD2B5/enterprise cjkD3A6
cjkD3C3/apply cjkB5BD/to cjkC9FAcjkB2FA/production cjkC9CF/on cjkA1B0cjkB5E3cjkCAAFcjkB3C9cjkBDF0/spin
gold from strawcjkA1B1, *pro* cjkB4F3cjkB4F3/greatly cjkCCE1cjkB8DF/improve
cjkC1CB/ASP cjkD6D0cjkB9FA/China cjkC4F8/nickel cjkB9A4cjkD2B5/industry cjkB5C4/DE cjkC9FA
cjkB2FA/production cjkCBAEcjkC6BD/level.
cjkA1B0Among these achievements, 138 items have been ap-
plied to production by enterprises to spin gold from straw,
which greatly improved the production level of ChinacjkA1AFs
nickel industry.cjkA1B1
It is not the case, however that overt pro-nouns in
Chinese will always correspond to overt pronouns
in English. In (5), the overt pronoun cjkA1B0cjkD5E2/thiscjkA1B1in
Chinese corresponds with a null pronoun in English
in the beginning of a reduced relative clause, while
in (6), the null pronoun in Chinese is translated into
a relative pronoun cjkA1B0whichcjkA1B1that introduces a rela-
tive clause. In other cases, neither language has an
overt pronoun, although one is posited in the tree-
bank annotation, as in (7).
(7) cjkC8A5cjkC4EA/last year, cjkC5A6cjkD4BC/New York cjkD0C2/new cjkC9CFcjkCAD0/list cjkB5C4/DE
cjkCDE2cjkB9FA/foreigncjkC6F3cjkD2B5/enterprisecjkB9B2/altogethercjkD3D0/have 61/61
cjkBCD2/CL, *pro* cjkB4B4/create cjkC0FAcjkC4EA/recent year cjkC0B4/since cjkD7EE
cjkB8DF/highestcjkBCCDcjkC2BC/record.
cjkA1B0Last year, there were 61 new foreign en-terpises listed
in New York Stock Exchange, *PRO* creating the high-
est record in history.cjkA1B1
Having a parallel propbank annotated with event
variables allows us to examine how the same events
are lexicalized in English and Chi-nese and how they
align, whether they have been indicated by verbs or
nouns.
3.2 Grouped sense tags
In general, the verbs in the Chinese PropBank are
less polysemous than the English PropBank verbs,
with the vast majority of the lemmas having just one
Frameset. On the other hand, the Chinese PropBank
has more lemmas (including stative verbs which are
generally translated into adjectives in English) nor-
malized by the corpus size. The Chinese PropBank
has 4854 lemmas in the 250K words that have been
propbanked alone, while the English PropBank has
just 3635 lemmas in the entire 1 million words cor-
pus. Of the 4854 Chinese lemmas, only 62 of them
have 3 or more framesets. In contrast, 294 lemmas
have 3 or more framesets in the English Propbank.
64
Verb English senses Chinese translations
appear
be or have a quality of being cjkCFD4cjkB5C3,cjkB3CAcjkCFD6
come forth, become known or visible, physically or figuratively cjkB3F6cjkCFD6,cjkB3CAcjkCFD6
present oneself formally, usually in a legal setting cjkC2B6cjkC3E6
fight
combat or oppose cjkB4F2cjkBAC3,cjkD5BDcjkB6B7,cjkBFB9
strive, make a strenuous effort cjkB7DCcjkB6B7
promote, campaign or crusade cjkB7DCcjkB6B7
join
connect, link or unite separate things, physically or abstractly cjkCFCEcjkBDD3,cjkBDD3cjkB9EC
enlist or accept membership within some group or organization cjkD7DFcjkBDF8,cjkB2CEcjkBCD3,cjkBCD3cjkC8EB
participate with someone else in some event cjkCDAC...cjkD2BBcjkB5C0,cjkCDAC...cjkD2BBcjkC6F0
realize
be congnizant of, comprehend, perceive cjkC8CFcjkCAB6,cjkD2E2cjkCAB6
actualize, make real cjkCAB5cjkCFD6
take in , earn, acquire cjkCAB5cjkCFD6
pass
tavel by cjkBEAD
clear, come through, succeed cjkCDA8cjkB9FD
elapse, happen cjkB9FDcjkC8A5,cjkC6DAcjkC2FA
communicate cjkB4ABcjkB3F6
settle resolve, finalize, accept cjkBDE2cjkBEF6reside, inhabit cjkBDF8cjkD7A4,cjkC2E4cjkBBA7
raise
increase cjkCCE1cjkB8DF
lift, elevate, orient upwards cjkD1F6
collect, levy cjkC4BCcjkBCAF,cjkB3EFcjkBCAF,cjkB3EFcjkB4EB
inovke, elicit, set off cjkCCE1,cjkCCE1cjkB3F6
Table 1: English verbs and their translations in the parallel Propbank
In our sense-tagging part of the project, we have
been using manual groupings of the English Word-
Net senses. These groupings were previously shown
to reconcile a substantial portion of the tagging dis-
agreements, raising inter-annotator agreement from
71% in the case of fine-grained WordNet senses to
82% in the case of grouped senses for the Sense-
val 2 English data (Palmer et al., to appear), and
currently to 89% for 93 new verbs (almost 12K in-
stances) (Palmer et al., 2004). The question which
arises, however, is how useful these grouped senses
are and whether the level of granularity which they
provide is sufficient for such applications as machine
translation from English to Chinese.
In a preliminary investigation, we randomly se-
lected 7 verbs and 5 nouns and looked at their corre-
sponding translations in the Chinese Propbank. As
the tables below show, for 6 verbs (join, pass, set-
tle, raise, appear, fight) and 3 nouns (resolution, or-
ganization, development), grouped English senses
map to unique Chinese translation sets. For a few
examples, which include realize and party, grouped
senses map to the same word in Chinese, preserving
the ambiguity. This investigation justifies the appro-
priateness of the grouped sense tags, and indicates
potential for providing a useful level of granularity
for MT.
3.3 Discourse connectives
Another component of the Chinese / English Parallel
Propbank II is the annotation of dis-course connec-
tives for both Chinese corpus and its English trans-
lation. Like the other two components, the anno-
tation is performed on the first 100K words of the
Parallel Chinese English Treebank. The annotation
of Chinese discourse connectives follows in large
part the theoretic assumptions and annotation prac-
tices of the English Penn Discourse Project (PDTB)
(Miltsakaki et al., 2004). Adaptations are made only
when they are warranted by the linguistic facts of
Chinese. While the English PTDB annotates both
explicit and implicit discourse connectives, our ini-
65
Noun English senses Chinese translations
organization
individuals working together cjkD7E9cjkD6AF,cjkBBFAcjkB9B9,cjkB5A5cjkCEBB
event: putting things together cjkB3EFcjkD7E9
state: the quality of being well-organization cjkD7E9cjkD6AF
party
event: an occasion on which people can assemble
for social interaction and entertainment
cjkBBE1
political organization cjkB5B3cjkC5C9
a band of people associated temporarily in some
activity cjkB7BD
person or side in legal context
investment time or money risked in hopes of profit cjkCDB6cjkD7CA,cjkD7CAthe act of investing cjkCDB6cjkD7CA
development the process of development cjkBFAAcjkB7A2,cjkB7A2cjkD5B9the act of development cjkB7A2cjkD5B9
resolution a formal declaration cjkD0ADcjkD2E9,cjkBEF6cjkB6A8coming to a solution cjkBDE2cjkBEF6
Table 2: English nouns and their translations in the parallel Propbank
tial focus is on explicit discourse connectives. Ex-
plicit discourse connectives include subordinate (8)
and coordinate conjunctions (9) as well as discourse
adverbials (10). While subordinate and coordinate
conjunctions are easy to understand, discourse ad-
verbials need a little more elaboration. Discourse
adverbials differ from other adverbials in that they
relate two propositions. Typically one can be found
in the immediate context while the other may need
to be identified in the previous discourse.
(8) [arg1 cjkCCA8cjkCDE5/Taiwan cjkC9CCcjkC8CB/businessman] [conn cjkCBE4
cjkC8BB/although] [arg1 cjkC9FAcjkBBEE/live cjkD4DA/at cjkCDE2/foreign land],
[arg2 cjkBBB9cjkCAC7/still cjkBADC/very cjkD7A2cjkD6D8/stress cjkBAA2cjkD7D3/child cjkBDCC
cjkD3FD/education].
cjkA1B0Although these Taiwan businessmen live away from
home, they still stress the importance of their children’s
education.cjkA1B1
(9) [arg1 cjkB6ABcjkD1C7/EastcjkB8F7/every cjkB9FA/country cjkBCE4/among cjkB2A2cjkB7C7/not
really cjkCDEAcjkC8AB/completely cjkC3BBcjkD3D0/not have cjkC3ACcjkB6DC/conflict
cjkBACD/and cjkB7D6cjkC6E7/difference], [conncjkB5ABcjkCAC7/but] [arg2 cjkCEAAcjkC1CB/for
cjkB1A3cjkD5CF/protect cjkB6ABcjkD1C7/East Asia cjkB8F7/every cjkB9FA/country cjkB5C4/DE
cjkC0FBcjkD2E6/interest,cjkB1D8cjkD0EB/mustcjkBDF8cjkD2BBcjkB2BD/furthercjkBCD3cjkC7BF/strengthen
cjkB6ABcjkD1C7/East AsiacjkBACFcjkD7F7/cooperation].
cjkA1B0It is not really true that there are no conflicts and dif-
ferences among the East Asian countries, but in order to
protect their common interest, they must cooperate.cjkA1B1
(10) [arg1 cjkC6D6cjkB6AB/Pudong cjkBFAAcjkB7A2/development cjkCAC7/BE cjkD2BB/one
cjkCFEE/CL cjkD5F1cjkD0CB/invigorate cjkC9CFcjkBAA3/Shanghai cjkB5C4/DE cjkBFE7/across
cjkCAC0cjkBCCD/century cjkB9A4cjkB3CC/project], [conn cjkD2F2cjkB4CB/therefore] [arg2
cjkB4F3cjkC1BF/large quantitycjkB3F6cjkCFD6/appearcjkB5C4/DEcjkCAC7/BEcjkD0C2/newcjkCECA
cjkCCE2/problem]. cjkA1B0The development of Pudong, a project
de-signed to invigorate Shanghai, spans over different
centuries. Therefore, new problems occur in large quan-
tities.cjkA1B1
The annotation of the discourse connectives in a
parallel English Chinese Propbank exposes interest-
ing correspondences between English and Chinese
discourse connectives. The examples in (11) show
that cjkA1B0cjkBDE1cjkB9FBcjkA1B1is polysemous and corresponds with
different expressions in English. It is a noun mean-
ing cjkA1B0resultcjkA1B1in (11a), where it is not a discourse
connective. In (11b) it means cjkA1B0in the endcjkA1B1, in-
voking a contrast between what has been planned
and how the actual result turned out. In (11c) it
meanscjkA1B0as a resultcjkA1B1, expressing causality between
the cause and the result.
(11) a. cjkCAB5cjkD0D0/adopt cjkA1B0cjkBDE4cjkBCB1cjkD3C3cjkC8CC/go slowcjkA1B1cjkB5C4/DE cjkD5FE
cjkB2DF/policy, cjkBDE1cjkB9FB/result cjkCAC7/BE cjkC6BEcjkB0D7/unnecessarily cjkB6AA
cjkCAA7/lose cjkD4DA/at cjkB4F3cjkC2BD/mainland cjkB5C4/DE cjkC9CCcjkBBFA/business
opportunity.
cjkA1B0The result of adopting the cjkA1AEgo slowcjkA1AFpolicy is
unnecessarily losing business opportunities in the
mainland.cjkA1B1
b. cjkCFCBcjkCEACcjkCBF9/fiber institute cjkBCC6cjkBBAE/plan cjkD5D0cjkCAD5/enroll cjkCAAE/10
cjkC3FB/CL cjkD1A7cjkC9FA/student, cjkBDE1cjkB9FB/in the end cjkD6BB/only
cjkD3D0/havecjkB6FEcjkCAAE/20cjkC8CB/personcjkB1A8cjkC3FB/register.
cjkA1B0The fiber institute planned to enroll 10 students. In
the end, only 20 people registered to take the exam.cjkA1B1
c. cjkD1A7cjkD0A3/school cjkB2BB/not cjkBDCC/teach cjkC0EDcjkB2C6/finance manage-
ment , cjkD2BBcjkB0E3/ordinary cjkC8CB/people cjkD3D6/and cjkD3D0/have
cjkD5E2/this cjkB7BDcjkC3E6/aspect cjkB5C4/DE cjkD0E8cjkC7F3/need, cjkBDE1cjkB9FB/as a
result, cjkB1A8cjkD5C2/newspaper cjkC9CF/on cjkB8F7/every cjkD6D6/kind cjkD7A8
cjkC0B8/colunn cjkBECD/then cjkB3C9cjkCEAA/become cjkD7CAcjkD1B6/information
cjkB5C4/DEcjkD6F7cjkD2AA/maincjkC0B4cjkD4B4/source.
cjkA1B0The school does not teach finance management and
66
ordinary people have this need. As a result, the dif-
ferent kinds of columns in the newspaper become the
main source of information.cjkA1B1
4 Conclusion
This paper presented preliminary results of the par-
allel PropBank II project. It highlighted some in-
teresting aspects of the differences between English
and Chinese, which play an important role for MT
and other applications. Some of the questions ad-
dressed had to do with how events are lexicalized
and aligned in the two languages, which level of
sense granularity is needed for MT from English
to Chinese, and highlighting notable differences be-
tween discourse connectives in the two languages.
Further investigation and alignment of the parallel
corpus, as well as richer annotation, will reveal other
interesting phenomena.

References
Olga Babko-Malaya and Martha Palmer. 2005. Propo-
sition Bank II: Delving Deeper. In Frontiers in
Corpus Annotation, Workshop in conjunction with
HLT/NAACL 2004, Boston, Massachusetts.
Olga Babko-Malaya, Martha Palmer, Nianwen Xue, Ar-
avind Joshi, and Seth Kulick. 2004. Exploiting Inter-
actions between Different Types of Semantic Annota-
tion. In Proceeding of ICWS-6, Tilburg, The Nether-
lands.
C. Baker, C. Fillmore, and J. Lowe. 1998. The berkeley
framenet project. In Proceedings of COLING-ACL,
Singapore.
E. Charniak. 2001. Immediate-head Parsing for Lan-
guage Models. In ACL-01.
Michael Collins. 1999. Head-driven Statistical Models
for Natural Language Parsing. Ph.D. thesis, Univer-
sity of Pennsylvania.
M. Ellsworth, K. Erk, P. Kingsbury, and S. Pado. 2004.
PropBank, SALSA and FrameNet: How design de-
termines product. In Proceedings of the LREC 2004
Workshop on Building Lexical Resources from Seman-
tically Annotated Corpora, Lisbon, Portugal.
Charles J. Fillmore and B. T. Atkins. 1998. FrameNet
and lexical relevantce. In Proceedings of the First In-
ternational Conference on Language Resources and
Evaluation, Granada, Spain.
Eva Hajicova and Iyona Kucerova. 2002. Argu-
ment/Valency Structure in PropBank, LCS Database
and Prague Dependency Treebank: A Comparative Pi-
lot Study. In Proceedings of the Third International
Conference on Language Resources and Evaluation,
pages 846–851.
Christopher R. Johnson, Charles J. Fillmore, Miriam
R. L. Petruck, Collin Baker, Michael Ellsworth,
Josef Ruppenhofer, and Esther J. Wood. 2002.
FrameNet: Theory and Practice, Version 1.0,
www.icsi.berkeley.edu/framenet.
M. Marcus, B. Santorini, and M. A. Marcinkiewicz.
1993. Building a Large Annotated Corpus of English:
the Penn Treebank. Computational Linguistics.
Mitchell Marcus, Grace Kim, Mary Ann Marcinkiewicz,
et al. 1994. The Penn Treebank: Annotating Predi-
cate Argument Structure. In Proc of ARPA speech and
Natural language workshop.
E. Miltsakaki, R. Prasad, A. Joshi, and B. Webber. 2004.
The Penn Discourse Treebank. In Proceedings of the
4th International Conference on Language Resources
and Evaluation, Lisbon, Portugal.
Martha Palmer, Olga Babko-Malaya, and Hoa Dang.
2004. Different Sense Granularities for Different Ap-
plications. In Proceedings of the 2nd Workshop on
Scalable Natural Language Understanding Systems,
Boston, Mass.
Martha Palmer, Dan Gildea, and Paul Kingsbury. 2005.
The proposition bank: An annotated corpus of seman-
tic roles. Computational Linguistics, 31(1).
Martha Palmer, Hoa Trang Dang, and Christiane Fell-
baum. to appear. Making fine-grained and coarse-
grained sense distinctions, both manually and auto-
matically. Journal of Natural Language Engineering.
Nianwen Xue and Martha Palmer. 2003. Annotating the
Propositions in the Penn Chinese Treebank. In The
Proceedings of the 2nd SIGHAN Workshop on Chinese
Language Processing, Sapporo, Japan.
Nianwen Xue, Fei Xia, Fu dong Chiou, and Martha
Palmer. To appear. The Penn Chinese Treebank:
Phrase Structure Annotation of a Large Corpus. Natu-
ral Language Engineering.
Nianwen Xue. To appear. Annotating the Discourse
Connectives in the Chinese Treebank. In Proceedings
of the ACL Workshop on Frontiers in Corpus Annota-
tion, Ann Arbor, Michigan.
