Proceedings of the Workshop on Frontiers in Corpus Annotation II: Pie in the Sky, pages 84–91,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Annotating Discourse Connectives in the Chinese Treebank ∗
Nianwen Xue
Department of Computer and Information Science
University of Pennsylvania
xueniwen@linc.cis.upenn.edu
Abstract
In this paper we examine the issues that
arise from the annotation of the discourse
connectives for the Chinese Discourse
Treebank Project. This project is based on
the same principles as the PDTB, a project
that annotates the English discourse con-
nectives in the Penn Treebank. The pa-
per begins by outlining range of discourse
connectives under consideration in this
project and examines the distribution of
the explicit discourse connectives. We
then examine the types of syntactic units
that can be arguments to the discourse
connectives. We show that one of the
most challenging issues in this type of dis-
course annotation is determining the tex-
tual spans of the arguments and this is
partly due to the hierarchical nature of dis-
course relations. Finally, we discuss sense
discrimination of the discourse connec-
tives, which involves separating discourse
connective from non-discourse connective
senses and teasing apart the different dis-
course connective senses, and discourse
connective variation, the use of differ-
ent connectives to represent the same dis-
course relation.
∗I thank Aravind Johi and Martha Palmer for their com-
ments. All errors are my own, of course.
1 Introduction
The goal of the Chinese Discourse Treebank
(CDTB) Project is to add a layer of discourse anno-
tation to the Penn Chinese Treebank (Xue et al., To
appear), the bulk of which has also been annotated
with predicate-argument structures. This project
is focused on discourse connectives, which include
explicit connectives such as subordinate and coor-
dinate conjunctions, discourse adverbials, as well
as implicit discourse connectives that are inferable
from neighboring sentences. Like the Penn English
Discourse Treebank (Miltsakaki et al., 2004a; Milt-
sakaki et al., 2004b), the CDTB project adopts the
general idea presented in (Webber and Joshi, 1998;
Webber et al., 1999; Webber et al., 2003) where
discourse connectives are considered to be predi-
cates that take abstract objects such as propositions,
events and situations as their arguments. This ap-
proach departs from the previous approaches to dis-
course analysis such as the Rhetorical Structure The-
ory (Mann and Thompson, 1988; Carlson et al.,
2003) in that it does not start from a predefined in-
ventory of abstract discourse relations. Instead, all
discourse relations are lexically grounded and an-
chored by a discourse connective. The discourse
relations so defined can be structural or anaphoric.
Structural discourse relations, generally anchored by
subordinate and coordinate conjunctions, hold lo-
cally between two adjacent units of discourse (such
as clauses). In contrast, anaphoric discourse rela-
tions are generally anchored by discourse adverbials
and only one argument can be identified structurally
in the local context while the other can only be de-
84
rived anaphorically in the previous discourse. An
advantage of this approach to discourse analysis is
that discourse relations can be built up incrementally
in a bottom-up manner and this advantage is magni-
fied in large-scale annotation projects where inter-
annotator agreement is crucial and has been verified
in the construction of the Penn English Discourse
Treebank (Miltsakaki et al., 2004a). This approach
closely parallels the annotation of the the verbs in
the English and Chinese Propbanks (Palmer et al.,
2005; Xue and Palmer, 2003), where verbs are the
anchors of predicate-argument structures. The dif-
ference is that the extents of the arguments to dis-
course connectives are far less certain, while the ar-
ity of the predcates is fixed for the discourse connec-
tives.
This paper outlines the issues that arise from the
annotation of Chinese discourse connectives, with
an initial focus on explicit discourse connectives.
Section 2 gives an overview of the different kinds
of discourse connectives that we plan to annotate
for the CDTB Project. Section 3 surveys the dis-
tribution of the discourse connectives and Section
4 describes the kinds of discourse units that can be
arguments to the discourse connectives. Section 5
specifies the scope of the arguments of discourse re-
lations and describes what should be included in or
excluded from the text span of the arguments. Sec-
tions 6 and 7 describes the need for a mechanism
to address sense disambiguation and discourse con-
nective variation, drawing evidence from examples
of explicit discourse connectives. Finally, Section 8
concludes this paper.
2 Overview of Chinese Discourse
Connectives
With our theoretical disposition, a discourse connec-
tive is viewed as a predicate taking two abstract ob-
jects such as propositions, events, or situations as
its arguments. A discourse connective can be ei-
ther explicit or implicit. An explicit discourse con-
nective is realized in the form of one lexical item
or several lexical items while an implicit discourse
connective must be inferred between adjacent dis-
course units. Typical explicit discourse connectives
are subordinate and coordinate conjunctions as well
as discourse adverbials. While the arguments for
subordinate and coordinate conjunctions are gener-
ally local, the first argument for a discourse adver-
bial may need to be identified long-distance in the
previous discourse.
2.1 Subordinate conjunctions
There are two types of subordinate conjunctions in
Chinese, single and paired. With single subordi-
nate conjunctions, the subordinate conjunction in-
troduces the subordinate clause, as in (1). By con-
vention, the subordinate clause is labeled ARG1 and
the main clause is labeled ARG2. The subordinate
conjunction is NOT included as part of the argu-
ment. The subordinate clause generally precedes the
main clause in Chinese, but occasionally it can also
follow the main clause. The assignment of the argu-
ment labels to the discourse units is independent of
their syntactic distributions. The subordinate clause
is always labeled ARG1 whether it precedes or fol-
lows the main clause.
Simple subordinate conjunctions: Simple sub-
ordinate conjunctions are very much like English
where the subordinate clause is introduced by a sub-
ordinate conjunction:
(1) cjkB1A8cjkB8E6
report
cjkC8CFcjkCEAA
believe
cjkA3AC
,
[conn cjkC8E7cjkB9FB
if
] [arg1 cjkBEADcjkBCC3
economic
cjkBACD
and
cjkBDF0cjkC8DA
financial
cjkD5FEcjkB2DF
policy
cjkB5C3cjkC1A6
effective
]cjkA3AC
,
[arg2cjkD1C7cjkD6DE
Asia
cjkB5D8cjkC7F8
region
cjkBEADcjkBCC3
economy
cjkBFC9cjkCDFB
expect
cjkD4DA
in
cjkA3B1cjkA3B9cjkA3B9cjkA3B9cjkC4EA
1999
cjkBFAAcjkCABC
begin
cjkBBD8cjkC9FD
recover
]cjkA1A3
.
”The report believes that if the economic and financial
policies are effective, Asian economy is expected to re-
cover in 1999.”
Paired subordinate conjunctions: Chinese also
abounds in paired subordinate conjunctions, where
the subordinate conjunction introduces the subordi-
nate clause and another discourse connective intro-
duces the main clause, as in (2). In this case, the dis-
course connectives are considered to be paired and
jointly anchor ONE discourse relation.
(2) [conn cjkC8E7cjkB9FB
if
] [arg1 cjkB8C4cjkB8EF
reform
cjkB4EBcjkCAA9
measure
cjkB2BB
not
cjkB5C3cjkC1A6
effective
cjkA3AC
,
cjkD0C5cjkD0C4
confidence
cjkCEA3cjkBBFA
crisis
cjkD2C0cjkC8BB
still
cjkB4E6cjkD4DA
exist
]cjkA3AC
,
[conn cjkC4C7cjkC3B4
then
] [arg2
cjkCDB6cjkD7CAcjkD5DF
investor
cjkBECD
will
cjkD3D0
have
cjkBFC9cjkC4DC
possibility
cjkB0D1
BA
cjkD7A2cjkD2E2cjkC1A6
attention
cjkD7AAcjkCFF2
turn
cjkC6E4cjkCBFB
other
cjkD0C2cjkD0CB
emerging
cjkCAD0cjkB3A1
market
]cjkA1A3
.
”If the reform measures are not effective, confidence cri-
sis still exists, then investors is likely to turn their atten-
tion to other emerging markets.”
85
Modified discourse connectives: Like English,
some subordinate conjunctions can be modified by
an adverb, as illustrated in (3). Note that the subordi-
nate conjunction is in clause-medial position. When
this happens, the first argument, ARG1 in this case,
becomes discontinuous. Both portions of the argu-
ment, the one that comes before the subordinate con-
junction and the one after, are considered to be part
of the same argument.
(3) [arg1 cjkC8A5cjkC4EA
last year
cjkB3F5
beginning
cjkC6D6cjkB6AB
Pudong
cjkD0C2cjkC7F8
new district
cjkB5AEcjkC9FA
open
cjkB5C4
DE
cjkD6D0cjkB9FA
China
cjkB5DAcjkD2BB
first
cjkBCD2
CL
cjkD2BDcjkC1C6
medical
cjkBBFAcjkB9B9
institution
cjkD2A9cjkC6B7
drug
cjkB2C9cjkB9BA
purchase
cjkB7FEcjkCEF1
service
cjkD6D0cjkD0C4
center
]cjkA3AC
,
[conn cjkD5FD
just
cjkD2F2cjkCEAA
because
] [arg1 cjkD2BB
once
cjkBFAAcjkCABC
begin
cjkBECD
cjkB1C8cjkBDCF
relatively
cjkB9E6cjkB7B6
standardized
]cjkA3AC
,
[arg2cjkD4CBcjkD7AA
operate
cjkD6C1cjkBDF1
till now
cjkA3AC
,
cjkB3C9cjkBDBB
trade
cjkD2A9cjkC6B7
medicine
cjkD2BBcjkD2DAcjkB6E0
over 100 million
cjkD4AA
yuan
cjkA3AC
,
cjkC3BBcjkD3D0
not
cjkB7A2cjkCFD6
find
cjkD2BB
one
cjkC0FD
case
cjkBBD8cjkBFDB
killback
]cjkA1A3
.
”It is because its operations are standardized that the first
purchase service center for medical institutions in China
opened in the new district of Pudong in the beginning of
last year has not found a single case of kickback after
it has traded 100 million yuan worth of medicine in its
operation till now.”
Conjoined discourse connectives: The subordi-
nate conjunctions can be conjoined in Chinese so
that there are two subordinate clauses each having
one instance of the same subordinate conjunction.
In this case, there is still one discourse relation,
but ARG1 is the conjunction of the two subordinate
clauses. This is in contrast with English, where only
one subordinate conjunction is possible and ARG1
is linked with a coordinate conjunction, as illustrated
in the English translation.
(4) [conn cjkCBE4cjkC8BB
although
] [arg1cjkBBC6cjkB4BAcjkC3F7
Huang Chunming
cjkD2D1cjkBEAD
already
cjkCAAEcjkBCB8
over 10
cjkC4EA
year
cjkC3BBcjkD3D0
not
cjkB3F6cjkB0E6
publish
cjkD0A1cjkCBB5cjkBCAF
novel series
cjkC1CB
AS
]cjkA3AC
,
[conn
cjkCBE4cjkC8BB
although
] [arg2 cjkB4D3
from
cjkA1B4
”
cjkB3C7cjkD7D0
city boys
cjkC2E4
miss
cjkB3B5
bus
cjkA1B5
”
cjkB5BD
to
cjkA1B4
”
cjkCADBcjkC6B1cjkBFDA
ticket box
cjkA1B5
”
cjkA3AC
,
cjkD6D0cjkBCE4
middle
cjkB8F4
span
cjkC1CB
AS
cjkC8FDcjkCAAEcjkC6DF
thirty seven
cjkC4EA
year
]cjkA3AC
,
[conncjkB5AB
but
] [arg2cjkBBC6cjkB4BAcjkC3F7
Huang Chunming
cjkB5C4
DE
cjkCEC4cjkD1A7
literary
cjkC4DAcjkD4DA
theme
cjkA3AC
,
cjkD3D0cjkD0A9
some
cjkB6ABcjkCEF7
thing
cjkBEB9cjkC8BB
surprisingly
cjkB4D3cjkC0B4
ever
cjkB6BC
have
cjkC3BBcjkD3D0
not
cjkB8C4cjkB1E4
change
]cjkA1A3
.
”Although Huang Chunming has not published a novel
series for over ten years, and it spans over thirty seven
years from ’City Boys Missed Bus’ to ’Ticket Box’,
surprisingly some things in Huang Chunming’s literary
themes have never changed.”
2.2 Coordinate conjunctions
The second type of explicit discourse connectives
we annotate are coordinate discourse conjunctions.
The arguments of coordinate conjunctions are anno-
tated in the order in which they appear. The argu-
ment that appears first is labeled ARG1 and the ar-
gument that appears next is marked ARG2. The co-
ordinate conjunctions themselves, like subordinate
conjunctions, are excluded from the arguments.
(5) cjkBDFCcjkC4EA
recent years
cjkC0B4
in
cjkA3AC
,
cjkC3C0cjkB9FA
the U.S.
cjkC3BF
every
cjkC4EA
year
cjkCCC7cjkC4F2cjkB2A1
diabetes
cjkD2BDcjkC1C6cjkB7D1
medical expense
cjkD4BC
about
cjkD2BBcjkB0D9cjkD2DA
10 billion
cjkC3C0cjkD4AA
dollar
cjkA3AC
,
cjkD3A1cjkB6C8
India
cjkC8A5cjkC4EA
last year
cjkCCC7cjkC4F2cjkB2A1
diabetes
cjkD2BDcjkC1C6cjkB7D1
medical expenses
cjkCEAA
be
cjkC1F9cjkB5E3cjkD2BBcjkD2DA
six hundred and 10 million
cjkC3C0cjkD4AA
dollar
cjkA3AC
,
[arg1cjkD6D0cjkB9FA
China
cjkC9D0
yet
cjkCEDE
not have
cjkBEDFcjkCCE5
concrete
cjkCDB3cjkBCC6
statistics
]cjkA3AC
,
[conn cjkB5AB
but
] [arg2 cjkD6D0cjkB9FA
China
cjkCCC7cjkC4F2cjkB2A1
diabetes
cjkC8CBcjkCAFD
population
cjkD5FD
currently
cjkD2D4
with
cjkC3BF
every
cjkC4EA
year
cjkC6DFcjkCAAEcjkCEE5cjkCDF2
750,000
cjkD0C2
new
cjkBBBCcjkD5DF
patient
cjkB5C4
DE
cjkCBD9cjkB6C8
speed
cjkB5DDcjkD4F6
increase
]cjkA1A3
.
”In recent years, the medical expenses for diabetes pa-
tients in the U.S. is about 10 billion dollars. Last year the
medical expenses for diabetes patients in India is six hun-
dred and ten million dollars. China does not have concrete
statistics yet, but its diabetes population is increasing at a
pace of 750,000 new patients per year.
Paired coordinate conjunctions: Like subordi-
nate conjunctions, coordinate conjunctions can also
be paired, as in (6):
(6) cjkCFD6cjkB4FA
modern
cjkB8B8cjkC4B8
parent
cjkC4D1
difficult
cjkCEAA
be
cjkB5C4
DE
cjkB5D8cjkB7BD
place
cjkD4DAcjkD3DA
lie in
[conncjkBCC8
CONN
] [arg1 cjkCEDEcjkB7A8
no way
cjkC5C5cjkB3FD
eliminate
cjkD1AAcjkD2BA
blood
cjkD6D0
in
cjkC1F7cjkB4AB
flow
cjkB5C4
DE
cjkB9DBcjkC4EE
tradition
]cjkA3AC
,
[conn cjkD3D6
CONN
] [arg2 cjkD2AA
need
cjkC3E6cjkB6D4
face
cjkD0C2
new
cjkB5C4
DE
cjkBCDBcjkD6B5
value
]cjkA1A3
.
”The difficulty of being modern parents lies in the fact
they can not get rid of the traditional values flowing in
their blood, and they also need to face new values.”
2.3 Adverbial connectives
The third type of explicit discourse connectives we
annotate are discourse adverbials. A discourse ad-
verbial differs from other adverbs in that they require
an antecedent that is a proposition or a set of related
propositions. Generally, the second argument is ad-
jacent to the discourse adverbial while the first argu-
ment may be long-distance. By convention, the sec-
ond argument that is adjacent to the discourse con-
nective is labeled ARG2 and the other argument is
86
marked as ARG1. Note that in (7b) that first argu-
ment is not adjacent to the discourse adverbial.
(7) a. cjkC3C0cjkB9FA
The U.S.
cjkC9CCcjkBBE1
Chamber of Commerce
cjkB9E3cjkB6AB
Guangdong
cjkB7D6cjkBBE1
Chapter
cjkBBE1cjkB3A4
Chairman
cjkBFB5cjkD3C0cjkBBAA
Kang Yonghua
cjkC2C9cjkCAA6
lawyer
cjkCBB5
say
cjkA3AC
,
[arg1 cjkBFCBcjkC1D6cjkB6D9
Clinton
cjkD5FEcjkB8AE
Administration
cjkD2D1cjkBEAD
already
cjkB1EDcjkCABE
indicate
cjkD2AA
will
cjkD1D3cjkB3A4
renew
cjkD6D0cjkB9FA
China
cjkB5C4
DE
cjkC3B3cjkD2D7
trade
cjkD7EEcjkBBDDcjkB9FA
MFN
cjkB4FDcjkD3F6
status
]cjkA3AC
,
[conn
cjkD2F2cjkB4CB
therefore
]cjkA3AC
,
[arg2 cjkD5E2
this
cjkB4CE
time
cjkD3CEcjkCBB5
lobby
cjkB5C4
DE
cjkD6D8cjkB5E3
focus
cjkCAC7
be
cjkC4C7cjkD0A9
those
cjkBDCF
relatively
cjkB1A3cjkCAD8
conservative
cjkB5C4
DE
cjkD2E9cjkD4B1
congressman
]cjkA1A3
.
”Lawyer Kang Yonghua, chairman of the Guangdong
Chapter of the U.S. Chamber of Commerce, says that
since the Clinton Administration has already indi-
cated that it will renew China’s MFN status, the focus
of the lobby this time is on those relatively conserva-
tive congressmen.”
b. [arg1 cjkD6D0cjkB9FA
China
cjkC5FAcjkD7BC
approve
cjkB5C4
DE
cjkCDE2cjkC6F3
foreign enterprise
cjkD6D0
in
cjkA3AC
,
cjkB9A4cjkD2B5
industry
cjkCFEEcjkC4BF
project
cjkD5BC
account for
cjkC6DFcjkB3C9
seventy percent,
cjkA3AC
among them
cjkC6E4cjkD6D0
processing
cjkBCD3cjkB9A4
industry
cjkB9A4cjkD2B5cjkC6AB
excessive
cjkB6E0
]cjkA3AC
,
cjkD5E2
this
cjkD3EB
with
cjkD6D0cjkB9FA
China
cjkC0CDcjkB6AFcjkC1A6
labor force
cjkCBD8cjkD6CA
training
cjkA1A2
,
cjkB3C9cjkB1BE
cost
cjkBDCF
relatively
cjkB5CD
low
cjkB5C4
DE
cjkB9FAcjkC7E9
state of affairs
cjkCFE0cjkCEC7cjkBACF
consistent
cjkA3AC
,
[conn cjkB4D3cjkB6F8
therefore
] [arg2 cjkCEFCcjkC4C9
absorb
cjkC1CB
ASP
cjkB4F3cjkC1BF
big volume
cjkC0CDcjkB6AFcjkC1A6
labor force
]cjkA1A3
.
”In the foreign enterprises that China approved of,
industry projects accounts for seventy percent of
them. Among them processing projects are exces-
sively high. This is consistent with the current state
of affairs in China where the training and cost of the
labor force is low. Therefore they absorbed a large
portion of the labor force.”
2.4 Implicit discourse connectives
In addition to the explicit discourse connectives,
there are also implicit discourse connectives that
must be inferred from adjacent propositions. The
arguments for implicit discourse connectives are
marked in the order in which they occur, with the
argument that occurs first marked as ARG1 and the
other argument marked as ARG2. By convention
a punctuation mark is reserved as the place-holder
for the discourse connective. Where possible, the
annotator is asked to provide an explicit discourse
connective to characterize the type of discourse re-
lation. In (8), for example, a coordinate conjunction
cjkB6F8”while” can be used in the place of the implicit
discourse connective.
(8) [arg1 cjkC6E4cjkD6D0
among them
cjkB3F6cjkBFDA
export
cjkCEAA
be
cjkD2BBcjkB0D9cjkC6DFcjkCAAEcjkB0CBcjkB5E3cjkC8FDcjkD2DA
17.83 billion
cjkC3C0cjkD4AA
dollar
cjkA3AC
,
cjkB1C8
compared with
cjkC8A5cjkC4EA
last year
cjkCDAC
same
cjkC6DA
period
cjkCFC2cjkBDB5
decrease
cjkB0D9cjkB7D6cjkD6AEcjkD2BBcjkB5E3cjkC8FD
1.3 percent
] [conn=cjkB6F8cjkA3BB
;
] [arg2 cjkBDF8cjkBFDA
import
cjkD2BBcjkB0D9cjkB0CBcjkCAAEcjkB6FEcjkB5E3cjkC6DFcjkD2DA
18.27 billion
cjkC3C0cjkD4AA
dollar
cjkA3AC
,
cjkD4F6cjkB3A4
increase
cjkB0D9cjkB7D6cjkD6AEcjkC8FDcjkCAAEcjkCBC4cjkB5E3cjkD2BB
34.1 percent
]cjkA1A3
.
”Among them, export is 17.83 billion, an 1.3 percent in-
crease over the same period last year. Meanwhile, import
is 18.27 billion, which is a 34.1 percent increase.”
3 Where are the discourse connectives?
In Chinese, discourse connectives are generally
clause-initial or clause-medial, although localizers
are clause-final and can be used as discourse con-
nective by themselves or together with a preposi-
tion. Subordinate conjunctions, coordinate conjunc-
tions and discourse adverbial can all occur in clause-
initial as well as clause-medial positions. The distri-
bution of the discourse connectives is not uniform,
and varies from discourse connective to discourse
connective. Some discourse connectives alternate
between clause-initial and clause-medial positions.
The examples in (9) show that cjkBEA1cjkB9DC”even though”,
which forms a paired connective with cjkB5ABcjkCAC7”but”,
occurs in both clause-initial (9a) and clause-medial
(9b) positions.
(9) a. [conn cjkBEA1cjkB9DC
even though
] [arg1 cjkD1C7cjkD6DE
Asia
cjkD2BBcjkD0A9
some
cjkB9FAcjkBCD2
country
cjkB5C4
DE
cjkBDF0cjkC8DA
financial
cjkB6AFcjkB5B4
turmoil
cjkBBE1
will
cjkCAB9
make
cjkD5E2cjkD0A9
these
cjkB9FAcjkBCD2
country
cjkB5C4
DE
cjkBEADcjkBCC3
economy
cjkD4F6cjkB3A4
growth
cjkCADCcjkB5BD
experience
cjkD1CFcjkD6D8
serious
cjkD3B0cjkCFEC
impact
]cjkA3AC
,
[conn cjkB5AB
but
] [arg2 cjkBECD
to
cjkD5FB
whole
cjkB8F6
CL
cjkCAC0cjkBDE7
world
cjkBEADcjkBCC3
economy
cjkB6F8cjkD1D4
cjkA3AC
,
cjkC6E4cjkCBFB
other
cjkB9FAcjkBCD2
country
cjkB5C4
DE
cjkC7BFcjkBEA2
strong
cjkD4F6cjkB3A4
growth
cjkCAC6cjkCDB7
momentum
cjkBBE1
will
cjkC3D6cjkB2B9
compensate
cjkD5E2
this
cjkD2BB
one
cjkCBF0cjkCAA7
loss
]cjkA1A3
.
”Even though the financial turmoil in some Asian
countries will affect the economic growth of these
countries, as far as the economy of the whole world
is concerned, the strong economic growth of other
countries will make up for this loss.”
b. [arg1 cjkD5B9cjkCDFB
look ahead
cjkBBA2cjkC4EA
Year of Tiger
cjkA3AC
,
cjkD6D0cjkB9FA
China
cjkB5C4
DE
cjkBEADcjkBCC3
economy
cjkC1D0cjkB3B5
train
] [conn cjkBEA1cjkB9DC
even though
] [arg1cjkBBE1
will
87
cjkD3D0
have
cjkB5DFcjkF4A4
ups and
cjkC6F0cjkB7FC
downs
]cjkA3AC
,
[conncjkB5AB
but
] [arg2 cjkD6BBcjkD2AA
as long as
cjkB5F7cjkBFD8
adjust
cjkB4EBcjkCAA9
measure
cjkCACAcjkCAB1
timely
cjkA1A2
,
cjkB5C3cjkB5B1
proper
cjkA3AC
,
cjkCFE0cjkD0C5
believe
cjkBBE1
will
cjkD1D8cjkD7C5
along
cjkD4A4cjkC9E8
expect
cjkB5C4
DE
cjkB9ECcjkB5C0
track
cjkCEC8cjkBDA1
steady
cjkC7B0cjkD0D0
advance
]cjkA1A3
.
”Looking ahead at the Year of Tiger, even though
China’s economic train will have its ups and downs,
as long as the adjusting measures are timely and
proper, we believe that it will advance steadily along
the expected track.”
Localizers are a class of words that occur after
clauses or noun phrases to denote temporal or spatial
discourse relations. They can introduce a subordi-
nate clause by themselves or together with a preposi-
tion. While the preposition is optional, the localizer
is not. When both the preposition and the localizer
occur, they form a paired discourse connective an-
choring a discourse relation. Example (10) shows
the preposition cjkB5B1and the localizer cjkCAB1form a paired
discourse connective equivalent to the English sub-
ordinate conjunction ”when”.
(10) cjkC8D5cjkC7B0
a few days ago
cjkA3AC
,
[conncjkB5B1
when
] [arg1 cjkBCC7cjkD5DF
reporter
cjkD4DA
at
cjkD5E2cjkC0EF
here
cjkD7A8cjkB7C3
interview exclusively
cjkC5B7cjkC3CB
EU
cjkC5B7cjkD6DE
Europe
cjkCEAFcjkD4B1cjkBBE1
Commission
cjkD7A4cjkBBAA
to China
cjkB4FAcjkB1EDcjkCDC5
delegation
cjkCDC5cjkB3A4
head
cjkCEBAcjkB8F9cjkC9EE
Wei Genshen
cjkB4F3cjkCAB9
ambassador
cjkA3AC
,
cjkC7EB
ask
cjkCBFB
he
cjkC6C0cjkBCDB
comment
cjkD5E2
this
cjkD2BB
one
cjkC4EA
year
cjkC0B4
since
cjkCBABcjkB7BD
two sides
cjkB5C4
DE
cjkBACFcjkD7F7
cooperation
cjkB3C9cjkB9FB
accomplishment
] [conncjkCAB1
when
]cjkA3AC
,
[arg2 cjkCBFB
he
cjkBAC1
little
cjkB2BB
no
cjkB3D9cjkD2C9
hesitate
cjkB5D8
DE
cjkCBB5
say
cjkA3BA
:
cjkA1B0
’
cjkC5B7cjkC3CB
EU
cjkCDAC
with
cjkD6D0cjkB9FA
China
cjkB5C4
DE
cjkD5FEcjkD6CE
political
cjkB9D8cjkCFB5
relation
cjkA1A2
,
cjkC3B3cjkD2D7
trade
cjkB9D8cjkCFB5
relation
cjkD2D4cjkBCB0
and
cjkD4DA
at
cjkCDB6cjkD7CA
investment
cjkB5C8
etc.
cjkB7BDcjkC3E6
aspect
cjkB5C4
DE
cjkBACFcjkD7F7
cooperation
cjkD4DA
in
cjkD2BBcjkBEC5cjkBEC5cjkC6DFcjkC4EA
1997
cjkB6BC
all
cjkC8A1cjkB5C3
achieve
cjkC1CB
ASP
cjkCFD4cjkD6F8
significant
cjkB5C4
DE
cjkB7A2cjkD5B9
progress
cjkA1A3
.
cjkA1B1]
”
”A few days ago, when this reporter exclusively inter-
viewed Wei Genshen, head of the EU Europe Commis-
sion delegation to China, and asked him to comment on
the accomplishment of the cooperation between the two
sides in the past year, without any hesitation he said:
’There was significant progress in the political relations,
trade relations, and the cooperation in trade, etc. between
EU and China.’ ”
4 What counts as an argument?
This section examines the syntactic composition of
arguments to discourse connectives in Chinese. Ar-
guments of discourse relations are propositional sit-
uations such as events, states, or properties. As such
an argument of a discourse relation can be realized
as a clause or multiple clauses, a sentence or mul-
tiple sentences. Typically, a subordinate conjunc-
tion introduces clauses that are arguments in a dis-
course relation. Discourse adverbials and coordinate
conjunctions, however, can take one or more sen-
tences to be their arguments. The examples in (11)
shows that arguments to discourse connectives can
be a single clause (11a), multiple clauses (11b), a
single sentence (11c) and multiple sentences (11d)
respectively.
(11) a. [conn cjkBEA1cjkB9DC
even though
] [arg1 cjkBDF1cjkC4EA
this year
cjkD2BB
January
cjkD6C1
to
cjkCAAEcjkD2BBcjkD4C2
November
cjkD6D0cjkB9FA
China
cjkC5FAcjkD7BC
approve
cjkC0FBcjkD3C3
utilize
cjkCDE2cjkD7CA
foreign investment
cjkCFEEcjkC4BF
project
cjkCAFD
number
cjkBACD
and
cjkBACFcjkCDAC
contract
cjkCDE2cjkD7CA
foreign investment
cjkBDF0cjkB6EE
amount
cjkB6BC
both
cjkB1C8
compared with
cjkC8A5cjkC4EA
last year
cjkCDAC
same
cjkC6DA
period
cjkD3D0cjkCBF9
have
cjkCFC2cjkBDB5
decrease
]cjkA3AC
,
[conncjkB5AB
but
] [arg2 cjkCAB5cjkBCCA
actually
cjkC0FBcjkD3C3
use
cjkCDE2cjkD7CA
foreign investment
cjkBDF0cjkB6EE
amount
cjkC8D4
still
cjkB1C8
compared with
cjkC8A5cjkC4EA
last year
cjkCDAC
same
cjkC6DA
period
cjkD4F6cjkB3A4
increase
cjkC1CB
ASP
cjkB0D9cjkB7D6cjkD6AEcjkB6FEcjkCAAEcjkC6DFcjkB5E3cjkC1E3cjkD2BB
27.01 percent
]cjkA1A3
.
”Even though the number of projects that use foreign
investment that China approved of and contractual
foreign investment both decreased compared with the
same period last year, the foreign investment that has
actually been used increased 27.01 percent.”
b. [conn cjkD3C9cjkD3DA
because
] [arg1 cjkC3A9cjkCCA8cjkBEC6
Maotai Liquor
cjkD6C6cjkD7F7
brew
cjkB9A4cjkD2D5
process
cjkB8B4cjkD4D3
complicated
cjkA3AC
,
cjkC9FAcjkB2FA
production
cjkD6DCcjkC6DA
cycle
cjkB3A4
long
]cjkA3AC
,
[conn
cjkD2F2cjkB6F8
therefore
] [arg2cjkC6E4
its
cjkB2FAcjkC1BF
production volume
cjkCAAEcjkB7D6
very
cjkD3D0cjkCFDE
limited
]cjkA1A3
.
”Because the brewing process of Maotai liquor is
complicated and its production cycle is long, its pro-
duction volume is very limited.”
c. [arg1 cjkD6D0cjkB9FA
Chinese
cjkC6B9cjkC5D2cjkC7F2
table tennis
cjkD4CBcjkB6AFcjkD4B1
athlete
cjkC3BBcjkD3D0
not
cjkB2CEcjkBCD3
participate
cjkB5DAcjkB6FEcjkCAAEcjkBEC5
twenty-ninth
cjkBACD
and
cjkC8FDcjkCAAE
thirtieth
cjkBDEC
CL
cjkCAC0cjkC6B9cjkC8FC
word table tennis tournament
]cjkA1A3
.
[conn cjkD2F2cjkB4CB
therefore
]
cjkA3AC
,
[arg2 cjkB8B4cjkD6C6
replicate
cjkB5C4
DE
cjkBDF0cjkC5C6
gold medal
cjkD6D0
in
cjkB0FCcjkC0A8
include
cjkBDAB
will
cjkD2AA
will
cjkBED9cjkD0D0
hold
cjkB5C4
DE
cjkB5DAcjkCBC4cjkCAAEcjkCEE5
forty-fifth
cjkBDEC
CL
cjkCAC0cjkC6B9cjkC8FC
world table tennis tournament
cjkBDF0cjkC5C6
gold medal
]cjkA1A3
.
88
”Chinese athletes did not attend the twenty-ninth and
the thirtieth world table tennis tournaments. There-
fore, The replicated gold medals also include the gold
medals in the yet-to-be-held forty-fifth world tourna-
ment.”
d. [arg1 cjkBBD8cjkB9E9
return
cjkBAF3
after
cjkB6D4
for
cjkB0C4cjkC3C5
Macao
cjkB5C4
DE
cjkCEB4cjkC0B4
future
cjkB7A2cjkD5B9
prospect
cjkCAC7
be
cjkC0FB
plus
cjkBBB9cjkCAC7
or
cjkB1D7
minus
cjkA3BF
?
cjkD3D0
have
cjkCEE5cjkB3C9cjkC8FD
53 percent
cjkB5C4
DE
cjkC8CB
people
cjkBBD8cjkB4F0
answer
cjkB2BB
not
cjkD6AAcjkB5C0
know
]cjkA1A3
.
[conn cjkB5AB
but
] [arg2 cjkB6D4cjkD3DA
to
cjkC4DC
can
cjkB2BB
not
cjkC4DC
can
cjkBDD3cjkCADC
accept
cjkBACD
like
cjkB8DB
Hong Kong
cjkB0C4
Macao
cjkD2BBcjkD1F9
same
cjkA3AC
,
cjkD2D4
with
cjkA1B8
’
cjkD2BB
one
cjkB9FA
country
cjkC1BD
two
cjkD6C6
system
cjkA1B9
’
cjkBDE2cjkBEF6
resolve
cjkCCA8cjkCDE5
Taiwan
cjkCECAcjkCCE2
issue
cjkA3AC
,
cjkD4F2
cjkD3D0
have
cjkB6FEcjkB3C9cjkC6DF
27 percent
cjkB5C4
DE
cjkC3F1cjkD6DA
people
cjkB1EDcjkCABE
indicate
cjkA1B8
’
cjkB2BB
not
cjkD6AAcjkB5C0
know
cjkA1B9
’
cjkA3AC
,
cjkCEE5cjkB3C9cjkBEC5
59 percent
cjkB5C4
DE
cjkC3F1cjkD6DA
people
cjkB1EDcjkCABE
indicate
cjkA1B8
’
cjkB2BB
not
cjkC4DC
can
cjkBDD3cjkCADC
accept
cjkA1B9
’
]cjkA1A3
.
”Is the return of sovereignty (to China) a plus or mi-
nus for Macao’s future? 53 percent of people say
they don’t know. But to the question of whether they
accept the resolution of the Taiwan issue with ’one
country, two systems’ like Hong Kong and Macao,
59 percent of the people say ’they cannot accept’ . ”
5 Argument Scope
Determining the scope of an argument to a discourse
connective has proved to be the most challenging
part of the discourse annotation. A lot of the effort
goes into deciding when certain text units should be
included in or excluded from the argument of a dis-
course connective. Under our annotation scheme,
the prepositional phrases, which generally precede
the subject in a Chinese clause, are included in the
argument of a discourse connective, as illustrated in
(12a). The material in the main clause that embeds
a discourse relation, however, are excluded, as in
(12b).
(12) a. cjkC1EDcjkCDE2
in addition
cjkA3AC
,
[arg1 cjkD4DA
in
cjkD0DDcjkCFD0
recreation
cjkCEC4cjkBBAF
culture
cjkC9FAcjkBBEE
life
cjkC8B1cjkB7A6
lack
cjkB5C4
DE
cjkB6ABcjkDDB8
Dongguan
]cjkA3AC
,
[conn cjkB3FDcjkB7C7
unless
] [arg1 cjkBADC
very
cjkD3D0
have
cjkBDCCcjkD3FD
education
cjkC8C8cjkB3CF
enthusiasm
]cjkA3AC
,
[conn cjkB7F1cjkD4F2
otherwise
] [arg2
cjkBADC
very
cjkC4D1
difficult
cjkC1F4cjkD7A1
keep
cjkBDCCcjkCAA6
teacher
]cjkA1A3
.
” In addition, in Dongguan where recreational ac-
tivities are lacking, unless they are very enthusiastic
about education, it is very hard to keep teachers.”
b. cjkC8CEcjkD6BEcjkB8D5
Ren Zhigang
cjkBBB9
also
cjkB1EDcjkCABE
indicate
cjkA3AC
,
[conn cjkD3C9cjkD3DA
because
] [arg1
cjkCFE3cjkB8DB
Hong Kong
cjkBACD
and
cjkC3C0cjkB9FA
the U.S.
cjkCFA2
interest
cjkB2EE
discrepancy
cjkB4EF
reach
cjkD2BBcjkB0D9cjkB6FEcjkCAAEcjkCEE5
125
cjkB5E3
point
]cjkA3AC
,
[arg2cjkC8E7cjkB9FB
if
cjkCAD0cjkB3A1
market
cjkB6D4
in
cjkCFE3cjkB8DB
Hong Kong
cjkBEADcjkBCC3
economic
cjkC7B0cjkBEB0
prospect
cjkB3E4cjkC2FA
full of
cjkD0C5cjkD0C4
confidence
cjkA3AC
,
cjkC8D4
still
cjkD3D0
have
cjkBCF5
reduce
cjkCFA2
interest
cjkBFD5cjkBCE4
space
]cjkA1A3
.
”Ren Zhigang also indicated that because the inter-
est discrepancy between Hong Kong and the U.S.
reaches 125 point, if the market is fully confident in
the economic prospect of Hong Kong, there is still
room for reducing interest rates.”
A lot of the challenge in determining the scope of
an argument stems from the fact that discourse struc-
tures are recursive. As such identifying the scope of
an argument is effectively determining how the dis-
course relations are hierarchically organized. This
is illustrated in (13), where the discourse relation
anchored by the coordinate conjunction cjkB5AB”but” is
embedded within the discourse relation anchored by
the subordinate conjunction cjkC8E7cjkB9FB”if”. The ambigu-
ity is whether the conditional clause introduced by
”cjkC8E7cjkB9FB” has scope over one or two of the clauses co-
ordinated bycjkB5AB”but”.
(13) cjkB1A8cjkB8E6
report
cjkC8CFcjkCEAA
believe
cjkA3AC
,
[conncjkC8E7cjkB9FB
if
] [arg1 cjkBEADcjkBCC3
economy
cjkBACD
and
cjkBDF0cjkC8DA
finance
cjkD5FEcjkB2DF
policy
cjkB5C3cjkC1A6
effective
]cjkA3AC
,
[arg2 [arg1 cjkD1C7cjkD6DE
Asia
cjkB5D8cjkC7F8
region
cjkBEADcjkBCC3
economy
cjkBFC9cjkCDFB
expect
cjkD4DA
in
cjkA3B1cjkA3B9cjkA3B9cjkA3B9cjkC4EA
1999
cjkBFAAcjkCABC
begin
cjkBBD8cjkC9FD
recover
]cjkA3AC
,
[conn
cjkB5AB
but
] [arg2cjkB2BB
not
cjkBBE1
will
cjkCFF3
like
cjkC4ABcjkCEF7cjkB8E7
Mexico
cjkBACD
and
cjkB0A2cjkB8F9cjkCDA2
Argentina
cjkD4DA
in
cjkA3B1cjkA3B9cjkA3B9cjkA3B4
1994
cjkA3AD
to
cjkA3B1cjkA3B9cjkA3B9cjkA3B5cjkC4EA
1995
cjkBDF0cjkC8DA
finance
cjkCEA3cjkBBFA
crisis
cjkBAF3
after
cjkC4C7cjkD1F9
like that
cjkB3F6cjkCFD6
occur
cjkB8DFcjkCBD9
high-speed
cjkA3D6cjkD0CE
V-shaped
cjkB4F3
big
cjkBBD8cjkC9FD
recovery
]]cjkA1A3
.
”The report believes that if the economic and financial
policies are effective, the economy of Asia is expected
to recover, but there will not be a V-shaped high-speed
recovery like the one after the financial crisis of Mexico
and Argentina in 1994 and 1995.”
Given our bottom-up approach in which discourse
connectives anchor binary discourse relations, we
do not explicitly annotate hierarchical structures be-
tween the arguments. However, such discourse re-
lations can be deduced when some discourse rela-
tions are recursively embedded within another as ar-
guments to another discourse connective.
89
6 Sense Disambiguation
Although discourse connectives are often consid-
ered to be a closed set, some lexical items in Chi-
nese can be used as both a discourse connective and
a non-discourse connective. In this case it is im-
portant to tease them part. There are also discourse
connectives that have different senses, and it is po-
tentially beneficial for certain NLP applications to
disambiguate these senses. Machine Translation, for
example, would need to translate the different senses
into different discourse connectives in the target lan-
guage. The examples in (14) shows the different
senses of cjkB6F8, which can be translated into ”while”
(14a), ”but” (14c), ”and” (14d) and ”instead” (14e).
Note that in (14e) it is important for the first argu-
ment to be negated by cjkB2BB”not”. In (14b), however,
it is not a discourse connective. It does not seem to
contribute any meaning to the sentence and is prob-
ably just there to satisfy some prosodic constraint.
(14) a. cjkA3B1cjkA3B9cjkA3B9cjkA3B7cjkC4EA
1997
cjkB7A2cjkB4EF
developed
cjkB9FAcjkBCD2
country
cjkBEADcjkBCC3
economic
cjkD0CEcjkCAC6
situation
cjkB5C4
DE
cjkCCD8cjkB5E3
characteristic
cjkCAC7
be
[arg1 cjkC3C0cjkB9FA
U.S.
cjkD4F6cjkB3A4
grow
cjkC7BFcjkBEA2
strongly
]
[conn cjkB6F8
while
] [arg2 cjkC8D5cjkB1BE
Japan
cjkBEADcjkBCC3
economy
cjkC6A3cjkC8ED
weak
]cjkA3AC
,
cjkC3C0cjkB9FA
U.S.
cjkBEADcjkBCC3
economic
cjkD4F6cjkB3A4cjkC2CA
growth
cjkB9C0cjkBCC6
estimate
cjkCEAA
be
cjkB0D9cjkB7D6cjkD6AEcjkC8FDcjkB5E3cjkC6DF
3.7 percent
cjkA3AC
,
cjkC8D5cjkB1BE
Japan
cjkBDF6
only
cjkCEAA
be
cjkB0D9cjkB7D6cjkD6AEcjkC1E3cjkB5E3cjkB0CB
0.8 percent
cjkA1A3
.
”The economic situation in developed countries in
1997 is that the U.S. (economy) grows strongly while
the Japanese economy is weak. The U.S. economic
growth rate was estimated to be 3.7 percent while the
Japanese economy grows at 0.8 percent.”
b. cjkCBAEcjkB6AB
Shuidong
cjkBFAAcjkB7A2cjkC7F8
Development Zone
cjkCEBB
located
cjkD3DA
at
cjkD4C1cjkCEF7
western Guangdong
cjkB5D8cjkC7F8
region
cjkB5C4
DE
cjkC3AFcjkC3FBcjkCAD0
Maoming city
cjkBEB3cjkC4DA
territory
cjkA3AC
,
cjkC3E6cjkBBFD
coverage
cjkB0CBcjkCAAEcjkB6E0
over eighty
cjkC6BDcjkB7BDcjkB9ABcjkC0EF
square kilometer
cjkA3AC
,
cjkCAC7
be
cjkCACAcjkD3A6
suit
cjkD2D2cjkCFA9
ethylene
cjkB9A4cjkB3CC
project
cjkB5C4
DE
cjkD0E8cjkD2AA
need
[? cjkB6F8
?
]cjkBDA8cjkC1A2
establish
cjkB5C4
DE
cjkD2BB
one
cjkB8F6
CL
cjkBAF3cjkBCCC
downstream
cjkBCD3cjkB9A4
process
cjkBBF9cjkB5D8
base
cjkA1A3
.
”Shuidong Development Zone, located in Maoming
City of western Guangdong occupies an area of over
eighty square kilometers. It is a downstream process-
ing base established to meet the need of the ethylene
project.”
c. cjkC4DC
can
cjkC9FAcjkB2FA
produce
[arg1 cjkD6D0cjkB9FA
China
cjkB2BB
not
cjkC4DC
can
cjkC9FAcjkB2FA
produce
] [conn
cjkB6F8
but
] [arg2 cjkD3D6
again
cjkBADC
badly
cjkD0E8cjkD2AA
need
]cjkB5C4
DE
cjkD2A9cjkC6B7
drug
cjkB5C4
DE
cjkC6F3cjkD2B5
enterprise
”Enterprises that can produce drugs that China badly
needs but cannot produce”
d. cjkBCAAcjkC1D6cjkCAA1
Jilin Province
cjkE7F5cjkB4BAcjkCAD0
Huichun City
cjkCAD0cjkB3A4
mayor
cjkBDF0cjkCBB6cjkC8CA
Jin Shuoren
cjkCBB5
say
cjkA3BA
:
cjkA1B0
”
cjkB9FAcjkBCCA
international
cjkC9E7cjkBBE1
community
cjkB5C4
DE
cjkD6A7cjkB3D6
support
cjkBACD
and
cjkB2CEcjkD3EB
participation
cjkA3AC
,
cjkB6D4cjkD3DA
to
cjkE7F5cjkB4BA
Huichun
cjkB5C4
DE
cjkBFAAcjkB7A2
development
cjkBFAAcjkB7C5
opening to the outside
cjkC6F0
play
cjkC1CB
DE
[arg1 cjkBBFDcjkBCAB
positive
]
[conn cjkB6F8
and
] [arg2 cjkB9D8cjkBCFC
key
]
DE
cjkB5C4cjkD7F7cjkD3C3
role
cjkA1A3
.
cjkA1B1
”
”Jing Shuoren, mayor of Huichun City of Jilin
Province said: ”The support and participation of the
international community played a positive and key
role in Huichun’s development and opening up to the
outside.”
e. [arg1 cjkD5E2
this
cjkB5B1cjkC8BB
certainly
cjkB2BB
not
cjkCAC7
be
cjkC0FAcjkCAB7
history
cjkB5C4
DE
cjkC7C9cjkBACF
coincidence
]
cjkA3AC
,
[conn cjkB6F8
instead
] [arg2 cjkCAC7
be
cjkC0FAcjkCAB7
history
cjkB5C4
DE
cjkBBFDcjkC0DB
accumulation
cjkBACD
and
cjkD7AAcjkBDD3
transition
]cjkA1A3
.
”This certainly is not historical coincidence. Instead
it is historical accumulation and transition.”
7 Discourse Connective Variation
The flip side of sense disambiguation is that one dis-
course relation is often realized with different dis-
course connectives due to the long evolution of the
Chinese language and morphological processes like
suoxie, which is one form of abbreviation. The
examples in (15) shows the different variations of
the discourse relation of concession. The different
forms of the discourse connective are so similar that
they can hardly be considered to be different dis-
course connectives. In principle, any combination
of part 1 and part 2 from Table 7 can form a paired
discourse connective, subject to some non-discourse
related constraints. In (15a), for example, the abbre-
viated cjkCBE4can only occur in clause-medial positions.
(15b) shows the second part of the paired discourse
connective can be dropped without changing the se-
mantics of the discourse relation. (15c) shows that
the second part of the paired discourse connective
can be combined with another discourse connective.
(15) a. [arg1 cjkCDF5cjkCFE8
Wang Xiang
] [conn cjkCBE4
although
] [arg1
cjkC4EAcjkB9FDcjkB0EBcjkB0D9
over fifty years old
]cjkA3AC
,
[conncjkB5AB
but
] [arg2 cjkC6E4
his
90
gloss discourse connectives
although [1]cjkCBE4cjkC8BB,cjkCBE4cjkCBB5,cjkCBE4[2]cjkB5ABcjkCAC7,cjkB5AB,cjkBBB9cjkCAC7,cjkBFC9cjkCAC7,cjkC8B4,cjkC8BBcjkB6F8,cjkB2BBcjkB9FD
because [1]cjkD2F2cjkCEAA,cjkD2F2,cjkD3C9cjkD3DA[2]cjkCBF9cjkD2D4
if [1]cjkC8E7cjkB9FB,cjkC8F4,cjkBCD9cjkC8E7[2]cjkBECD
therefore cjkD2F2cjkB4CB,cjkD3DAcjkCAC7
Table 1: Discourse connective variation
cjkB3E4cjkC5E6
abundant
cjkB5C4
DE
cjkBEABcjkC1A6
energy
cjkBACD
and
cjkC3F4cjkBDDD
quick
cjkB5C4
DE
cjkCBBCcjkCEAC
thinking
cjkA3AC
,
cjkB8F8
give
cjkC8CB
people
cjkD2D4
with
cjkD2BB
one
cjkB8F6
CL
cjkCCF4cjkD5BDcjkD5DF
challenger
cjkB5C4
DE
cjkD3A1cjkCFF3
impression
]cjkA1A3
.
”Although Wang Xiang is over fifty years old, but his
abundant energy and quick thinking gives people the
impression of a challenger.”
b. [arg1 cjkCDE2cjkD4DA
external
cjkB5C4
DE
cjkBBB7cjkBEB3
environment
] [conn cjkCBE4cjkC8BB
although
]
[arg1 cjkB8C4cjkB1E4
change
cjkC1CB
ASP
]cjkA3AC
,
[arg2 cjkC4DAcjkD0C4
heart
cjkC4C7
that
cjkB7DD
CL
cjkBFCAcjkCDFB
long for
cjkBCC7cjkD2E4
memory
cjkD3EB
and
cjkB9E9cjkCAF4
sense of belonging
cjkB5C4
DE
cjkD0E8cjkC7F3
need
cjkBADC
very
cjkC4D1
difficult
cjkB8C4cjkB1E4
change
]cjkA1A3
.
”Although the external environment has changed, the
need of longing for memory and sense of belonging
is very difficult to change.”
c. [arg1 cjkB4F3cjkC2BD
mainland
cjkD5FEcjkB2DF
policy
] [conn cjkCBE4cjkC8BB
although
] [arg1
cjkB6AFcjkE9FCcjkB5C3cjkBECC
vulnerable to criticism
]cjkA3AC
,
[conn cjkB5AB
but
cjkC8B4
but
] [arg2 cjkCAC7
be
cjkCBF9cjkD3D0
all
cjkD5FEcjkB2DF
policy
cjkB5C4
DE
cjkBBF9cjkB4A1
basis
]cjkA3AC
,
cjkC8CEcjkBACE
any
cjkBAF2cjkD1A1cjkC8CB
candidate
cjkB6BC
all
cjkCEDEcjkB7A8
cannot
cjkBAF6cjkCAD3
ignore
cjkA1A3
.
”Although the mainland policy is vulnerable to crit-
icism, it is the basis of all policies and no candidate
afford to ignore it.”
8 Conclusion
We examined the range of discourse connective we
plan to annotate for the Chinese Discourse Treebank
project. We have shown that while arguments to sub-
ordinate and coordinate conjunctions can be identi-
fied locally, arguments to discourse adverbials may
be long-distance. We also examined the distribution
of the discourse connectives in Chinese and the syn-
tactic composition and the scope of the arguments in
discourse relations. We have shown the most chal-
lenging issue in discourse annotation is determin-
ing the text span of a discourse argument and this
is partly due to the hierarchical nature of discourse
structures. We have discussed the need to address
sense disambiguation and discourse connective vari-
ation in our annotation of Chinese discourse connec-
tives.

References
Lynn Carlson, Daniel Marcu, and Mary Ellen Okurowski.
2003. Building a Discourse-Tagged Corpus in the
Framework of Rhetorical Structure Theory. In Current
Directions in Discourse and Dialogue. Kluwer Aca-
demic Publishers.
William Mann and Sandra Thompson. 1988. Rhetorical
Sturcture Theory. Text, 8(3):243–281.
E. Miltsakaki, R. Prasad, A. Joshi, and B. Webber. 2004a.
The Penn Discourse Treebank. In Proceedings of the
4th International Conference on Language Resources
and Evaluation, Lisbon, Portugal.
E. Miltsakaki, R. Prasad, A. Joshi, and B. Webber.
2004b. The Penn Discourse Treebank. In Proceedings
of the NAACL/HLT Workshop on Frontiers in Corpus
Annotation, Boston, Massachusetts.
Martha Palmer, Dan Gildea, and Paul Kingsbury. 2005.
The proposition bank: An annotated corpus of seman-
tic roles. Computational Linguistics, 31(1).
B. Webber and A. Joshi. 1998. Anchoring a lexi-
calized tree-adjoining grammar for discourse. In In
ACL/COLING Workshop on Discourse Relations and
Discourse Markers, Montreal, Canada.
Bonnie Webber, Alistair Knott, Matthew Stone, and Ar-
avind Joshi. 1999. Discourse Relations: A Structural
and Presuppositional Account using Lexicalized TAG.
In Meeting of the Association of Computational Lin-
guistics, College Park, MD.
Bonnie Webber, Aravind Joshi, Matthew Stone, and Al-
istair Knott. 2003. Anaphora and discourse structure.
Computational Linguistics, 29(4):545–587.
Nianwen Xue and Martha Palmer. 2003. Annotating the
Propositions in the Penn Chinese Treebank. In The
Proceedings of the 2nd SIGHAN Workshop on Chinese
Language Processing, Sapporo, Japan.
Nianwen Xue, Fei Xia, Fu dong Chiou, and Martha
Palmer. To appear. The Penn Chinese Treebank:
Phrase Structure Annotation of a Large Corpus. Natu-
ral Language Engineering.
