Annotating the Propositions in the Penn Chinese Treebank
Nianwen Xue
Dept. of Computer and Info. Science
University of Pennsylvania
Philadelphia, PA 19104, USA
xueniwen@linc.cis.upenn.edu
Martha Palmer
Dept. of Computer and Info. Science
University of Pennsylvania
Philadelphia, PA 19104, USA
mpalmer@linc.cis.upenn.edu
Abstract
In this paper, we describe an approach to
annotate the propositions in the Penn Chi-
nese Treebank. We describe how diathe-
sis alternation patterns can be used to
make coarse sense distinctions for Chi-
nese verbs as a necessary step in anno-
tating the predicate-structure of Chinese
verbs. We then discuss the representation
scheme we use to label the semantic argu-
ments and adjuncts of the predicates. We
discuss several complications for this type
of annotation and describe our solutions.
We then discuss how a lexical database
with predicate-argument structure infor-
mation can be used to ensure consistent
annotation. Finally, we discuss possible
applications for this resource.
1 Introduction
Linguistically interpreted corpora are instrumental
in supervised machine learning paradigms of natu-
ral language processing. The information encoded
in the corpora to a large extent determines what can
be learned by supervised machine learning systems.
Therefore, it is crucial to encode the desired level of
information for its automatic acquisition. The cre-
ation of the Penn English Treebank (Marcus et al.,
1993), a syntactically interpreted corpus, played a
crucial role in the advances in natural language pars-
ing technology (Collins, 1997; Collins, 2000; Char-
niak, 2000) for English. The creation of the Penn
Chinese Treebank (Xia et al., 2000) is also begin-
ning to help advance technologies in Chinese syn-
tactic analysis (Chiang, 2000; Bikel and Chiang,
2000). Since the treebanks are generally syntac-
tically oriented (cf. Sinica Treebank (Chen et al.,
to appear)), the information encoded there is ”shal-
low”. Important information useful for natural lan-
guage applications is missing. Most notably, signifi-
cant regularities in the predicate-argument structure
of lexical items are not captured. Recent effort in
semantic annotation, the creation of the Penn Propo-
sition Bank (Kingsbury and Palmer, 2002) on top
of the Penn English Treebank is beginning to ad-
dress this issue for English. In this new layer of
annotation, the regularities of the predicates, mostly
verbs, are captured in the predicate-argument struc-
ture. For example, in the sentences “The Congress
passed the bill” and “The bill passed”, it is intu-
itively clear that “the bill” plays the same role in the
two occurrences of the verb “pass”. Similar regular-
ities also exist in Chinese. For example, in “ /this
/CL /bill /pass /AS” and “ /Congress
/pass /AS /this /CL /bill”, “ /bill”
also plays the same role for the verb “ /pass” even
though it occurs in different syntactic positions (sub-
ject and object respectively).
Capturing such lexical regularities requires a
”deeper” level of annotation than generally provided
in a typical syntactically oriented treebank. It also
requires making sense distinctions at the appropriate
granularity. For example, the regularities demon-
strated for “pass” does not exist in other senses of
this verb. For example, in “He passed the exam”
and “He passed”, the object “the exam” of the tran-
sitive use of “pass” does not play the same role as
the subject “he” of the intransitive use. In fact, the
subject plays the same role in both sentences.
However, how deep the annotation can go is con-
strained by two important factors: how consistently
human annotators can implement this type of anno-
tation (the consistency issue) and whether the an-
notated information is learnable by machine (the
learnability issue). Making fine-grained sense dis-
tinctions, in particular, has been known to be dif-
ficult for human annotators as well as machine-
learning systems (Palmer et al., submitted). It seems
generally true that structural information is more
learnable than non-structural information, as evi-
denced by the higher parsing accuracy and relatively
poor fine-grained WSD accuracy. With this in mind,
we will propose a level of semantic annotation that
still can be captured in structural terms and add this
level of annotation to the Penn Chinese Treebank.
The rest of the paper is organized as follows. In Sec-
tion 2, we will discuss the annotation model in de-
tail and describe our representation scheme. We will
discuss some complications in Section 3 and some
implementation issues in Section 4. Possible appli-
cations of this resource are discussed in Section 5.
We will conclude in Section 6.
2 Annotation Model
In this section we describe a model that annotates
the predicate-argument structure of Chinese pred-
icates. This model captures the lexical regulari-
ties by assuming that different instances of a pred-
icate, usually a verb, have the same predicate argu-
ment structure if they have the same sense. Defin-
ing sense has been one of the most thorny issues
in natural language research (Ide and Vronis, 1998),
and the term ”sense” has been used to mean differ-
ent things, ranging from part-of-speech and homo-
phones, which are easier to define, to slippery fine-
grained semantic distinctions that are hard to make
consistently. Determining the ”right” level of sense
distinction for natural language applications is ul-
timately an empirical issue, with the best level of
sense distinction being the level with the least granu-
larity and yet sufficient for a natural language appli-
cation in question. Without gearing towards one par-
ticular application, our strategy is to use the struc-
tural regularities demonstrated in Section 1 to define
sense. Finer sense distinctions without clear struc-
tural indications are avoided. All instances of a pred-
icate that realize the same set of semantic roles are
assumed to have one sense, with the understanding
that not all of the semantic roles for this verb sense
have to be realized in a given verb instance, and that
the same semantic role may be realized in different
syntactic positions. All the possible syntactic real-
izations of the same set of semantic roles for a verb
sense are then alternations of one another. This
state of affairs has been characterized as diathe-
sis alternation and used to establish cross-predicate
generalizations and classifications (Levin, 1993). It
has been hypothesized and demonstrated that verbs
sharing the same disthesis alternation patterns also
have similar meaning postulates. It is equally plausi-
ble to assume then that verb instances having differ-
ent diathesis alternation patterns also have different
semantic properties and thus different senses.
Using diathesis alternation patterns as a diagnos-
tic test, we can identify the different senses for a
verb. Alternating syntactic frames for a particular
verb sense realizing the same set of semantic roles
(we call this roleset) form a frameset and share sim-
ilar semantic properties. It is easy to see that each
frameset, a set of syntactic frames for a verb, corre-
sponds with one roleset and vice versa. From now
on, we use the term frameset instead of sense for
clarity. Each frameset consists of one or more syn-
tactic frames and each syntactic frame realizes one
or more semantic roles. One frame differs from an-
other in the number and type of arguments its pred-
icate actually takes, and one frameset differs from
another in the total number and type of arguments
its predicate CAN take. This is illustrated graphi-
cally in Figure 1.
Annotating the predicate-argument structure in-
volves mapping the frameset identification informa-
tion for a predicate to an actual predicate instance in
the corpus and assign the semantic roles to its argu-
ments based on the syntactic frame of that predicate
instance. It is hoped that since framesets are defined
through diathesis alternation of syntactic frames, the
distinctions made are still structural in nature and
thus are machine-learnable and can be consistently
annotated by human annotators.
So far our discussion has focused on semantic ar-
Verb
FS                    FS               FS                 ......          FS
FR FR FR......FR j
        i
..... Argk Arg ArgArg0 1 2
   0        1   2
0 1 2
FS = Frameset       FR = Syntactic Frames       Arg = Arguments
Figure 1: Annotation model
guments, which play a central role in determining
the syntactic frames and framesets. There are other
elements in a proposition: semantic adjuncts. Com-
pared with semantic arguments, semantic adjuncts
do not play a role in defining the syntactic frames
or framesets because they occur in a wide variety of
predicates and as a result are not as discriminative as
semantic arguments. On the other hand, since they
can co-occur with a wide variety of predicates, they
are more generalizable and classifiable than seman-
tic arguments. In the next section, we will describe a
representation scheme that captures this dichotomy.
2.1 Representing arguments and adjuncts
Since the number and type of semantic arguments
for a predicate are unique and thus define the seman-
tic roles for a predicate, we label the arguments for
a predicate with a contiguous sequence of integers,
in the form of argN, where a0 is the integer between
0 and 5. Generally, a predicate has fewer than 6 ar-
guments. Since semantic adjuncts are not subcate-
gorized for by the predicate, we use one label argM
for all semantic adjuncts. ArgN identifies the argu-
ments while argM identifies all adjuncts. An argN
uniquely identifies an argument of a predicate even
if it occupies different syntactic positions in different
predicate instances. Missing arguments of a predi-
cate instance can be inferred by noting the missing
argument labels.
Additionally, we also use secondary tags to gen-
eralize and classify the semantic arguments and ad-
juncts when possible. For example, an adjunct re-
ceiving a a1a3a2a5a4 tag if it is a temporal adjunct. The
secondary tags are reserved for semantic adjuncts,
predicates that serve as arguments, as well as certain
arguments for phrasal verbs. The 18 secondary tags
and their descriptions are presented in Table 1.
11 functional tags for semantic adjuncts
ADV adverbial, default tag
BNF beneficiary
CND condition
DIR direction
DGR degree
FRQ frequency
LOC locative
MNR manner
PRP purpose or reason
TMP temporal
TPC topic
1 functional tag for predicate as argument
PRD predicate
6 functional tags for arguments to phrasal verbs
AS , , ,
AT ,
INTO , ,
ONTO
TO ,
TOWARDS ,
Table 1: List of functional tags
3 Complications
In this section we discuss several complications in
annotating the predicate-argument structure as de-
scribed in Section 2. Specifically, we discuss the
phenomenon of “split arguments” and the annota-
tion of nominalized verbs (or deverbal nouns).
3.1 Split Arguments
What can be characterized as “split arguments” are
cases where a constituent that occurs as one argu-
ment in one sentence can also be realized as mul-
tiple arguments (generally two) for the same pred-
icate in another sentence, without causing changes
in the meaning of the sentences. This phenomenon
surfaces in several different constructions. One such
construction involves “possessor raising”, where the
possessor (in a broad sense) raises to a higher posi-
tion. Examples 1a and 1b illustrate this. In 1a, the
possessor originates from the subject position and
raises to the topic1 position, while in 1b, the pos-
sessor originates from the object position and raises
1In Chinese, it is possible to have a topic in addition to the
subject. The topic is higher than the subject and plays an im-
portant role in the sentence (Li and Thompson, 1976).
to the subject position. The exact syntactic analysis
is not important here, and what is important is that
one argument in one sentence becomes two in an-
other. The challenge is then to capture this regularity
when annotating the predicate-argument structure of
the verb.
1. Possessor Raising
a. Subject to Topic
(IP (NP-PN-TPC /China)
(NP-TMP /last year)
(NP-SBJ /import-export
/total volume)
(VP /exceed
(QP-OBJ /325 Billion
(CLP /US. Dollar))))
/exceed
arg0-psr: /China
arg0-pse: /import-export /total volume
arg1: /325 Billion /US. Dollar
(IP (NP-TMP /last year)
(NP-SBJ (DNP (NP-PN /China)
/DE)
(NP /import-export
/volume))
(VP /exceed
(QP-OBJ /325 Billion
(CLP /US. Dollar))))
/exceed
arg0: /China /DE /import-export
/volume
arg1: /325 Billion /US. Dollar
b. Object to Subject
(IP (NP-SBJ (NP-PN /China)
(NP /economy
/expansion))
(VP (ADVP /also)
(ADVP /will)
(VP /slow down
(NP-OBJ /speed)))
/slow down
arg1-psr: /China /economy /expansion
arg1-pse: /speed
(IP (NP-SBJ (DNP (NP (NP-PN /China)
(NP /economy
/expansion))
)
(NP /speed))
(VP (ADVP /also)
(ADVP /will)
(VP /slow down))
/slow down
arg1: /China /economy /expansion
/DE /speed
Another case of “split arguments” involves the co-
ordinated noun phrases. In 2a, for example, the co-
ordinated structure as a whole is an argument to the
verb “ /sign”. In contrast, in 2b, one piece of the
argument, “ /China” is realized as a noun phrase
introduced by a preposition. There is no apparent
difference in meaning for the two sentences.
2. Coordination vs. Prepositional phrase
a. (IP (NP-PN-SBJ /Burma
/and
/China)
(VP (ADVP /already)
(VP /sign
/ASP
(NP-OBJ /border
/trade
/agreement))))
/sign
arg0: /Burma /and /China
arg1: /border /trade /agreement
b. (IP (NP-PN-SBJ /Burma)
(VP (ADVP /already)
(PP /with
(NP-PN /China))
(VP /sign
/ASP
(NP-OBJ /border
/trade
/agreement))))
/sign
arg0-crd: /Burma
arg0-crd: /China
arg1: /border /trade /agreement
There are two ways to capture this type of regu-
larity. One way is to treat each piece as a separate
argument. The problem is that for coordinated noun
phrases, there can be arbitrarily many coordinated
constituents. So we adopt the alternative approach
of representing the entire constituent as one argu-
ment. When the pieces are separate constituents,
they will receive the same argument label, with dif-
ferent secondary tags indicating they are parts of a
larger constituent. For example, in 1, when pos-
sessor raising occurs, the possessor and possessee
receive the same argument label with different sec-
ondary tags psr and pse. In 2b, both “ /China”
and “ /Burma” receive the label arg0, and the sec-
ondary label crd indicates each one is a part of the
coordinated constituent.
3.2 Nominalizations
Another complication involves nominalizations (or
deverbal nouns) and their co-occurrence with light
and not-so-light verbs. A nominalized verb, while
serving as an argument to another predicate (gen-
erally a verb), also has its own predicate-argument
structure. For example, in 3, the predicate-argument
structure for “ /doubt” should be “ ( ,
)”, where all the arguments of “ /doubt”
are embedded in the NP headed by “ /doubt”.
The complication arises when the nominalized noun
is a complement to another verb, as in 4, where
the subject “ /reader” is an argument to both
the verb “ /produce” and the nominalized verb
“ /doubt”. More interestingly, the other argument
“ /this /CL /news” is realized as an adjunct to
the verb (introduced by a preposition) even though
it bears no apparent thematic relationship to it.
It might be tempting to treat the verb
“ /develop” as a “light verb” that does not
have its own predicate-argument structure, but this
is questionable because “ /doubt” can also take a
noun that is not a nominalized verb: “ /I /towards
/she /develop /LE /feeling”. In addition,
there is no apparent difference in meaning for
“ /develop” between this sentence and 4, so there
is little basis to say these are two different senses of
this verb. So we annotate the predicate-argument
structure of both the verb “ ( , )” and the
nominalized verb “ ( , )”.
3. (IP (NP-SBJ (NP /reader)
(DNP (PP /towards
(NP (DP /this
(CLP /CL))
(NP /news)))
)
(NP /doubt))
(VP /deepen
/LE))
/deepen
arg1: /reader /towards /this /CL
/news
4. (IP (NP-SBJ /reader)
(VP (PP-DIR /towards
(NP (DP /this
(CLP /CL))
(NP /news)))
(ADVP /too)
(VP /will
(VP /develop
(NP-OBJ /doubt)))))
/develop
arg0: /reader
arg1: /doubt
4 Implementation
To implement the annotation model presented in
Section 2, we create a lexical database. Each entry is
a predicate listed with its framesets. The set of pos-
sible semantic roles for each frameset are also listed
with a mnemonic explanation. This explanation is
not part of the formal annotation. It is there to help
human annotators understand the different semantic
roles of this frameset. An annotated example is also
provided to help the human annotator.
As illustrated in Example 5, the verb “ /pass”
has three framesets, and each frameset corresponds
with a different meaning. The different meanings
can be diagnosed with diathesis alternations. For
example, when “ /pass” means “pass through”,
it allows dropped object. That is, the object does
not have to be syntactically realized. When it means
“pass by vote”, it also has an intransitive use. How-
ever, in this case, the verb demonstrates “subject of
the intransitive / object of the transitive” alternation.
That is, the subject in the intransitive use refers to
the same entity as the object in the transitive use.
When the verb means “pass an exam, test, inspec-
tion”, there is also the transitive/intransitive alterna-
tion. Only in this case, the object of the transitive
counterpart is now part of the subject in the intran-
sitive use. This is the argument-split problem dis-
cussed in the last section. The three framesets, rep-
resenting three senses, are illustrated in 5.
5. Verb: /pass
Frameset.01: , /pass through
Roles: arg0(“passer”), arg1(“place”)
Example:
(IP (NP-SBJ /train)
(VP (ADVP /now)
(VP /pass
(NP-OBJ /tunnel))))
.01/pass
arg0: /train
arg1: /tunnel
argM-ADV: /now
(IP (NP-SBJ /train)
(VP (ADVP /now)
(VP /pass)))
.01/pass
arg0: /train
argM-ADV: /now
Frameset.02: , ( , )/pass
(an exam, etc.)
(IP (NP-SBJ (DNP (NP /he)
/DE)
(NP /drug inspection))
(VP (ADVP /not)
(VP /pass)))
.02/pass
arg1: /he /DE /drug inspection
(IP (NP-SBJ (NP /he)
(VP (ADVP /not)
(VP /pass)))
(NP-OBJ /drug inspection))
.02/pass
arg1-psr: /he
arg1-pse: /drug inspection
Frameset.03: /pass (a bill, a law, etc.)
(IP (NP-PN-SBJ /the U.S.
/Congress)
(VP (NP-TMP /recently)
(VP /pass
/ASP
(NP-OBJ /interstate
/banking law))))
.03/pass
arg0: /the U.S.
arg1: /interstate /banking law
(IP (NP-SBJ (ADJP /interstate)
(NP /banking law))
(VP (NP-TMP /recently)
(VP /pass
/ASP)))
.03/pass
arg1: /interstate /banking law
The human annotator can use the information
specified in this entry to annotate all instances of
“ /pass” in a corpus. When annotating a predicate
instance, the annotator first determines the syntactic
frame of the predicate instance, and then determine
which frameset this frame instantiates. The frame-
set identification is then attached to this predicate
instance. This can be broadly construed as “sense-
tagging”, except that this type of sense tagging is
coarser, and the “senses” are based on structural dis-
tinctions rather than just semantic nuances. A dis-
tinction is made only when the semantic distinc-
tions also coincide with some structural distinctions.
The expectation is that this type of sense tagging is
much amenable to automatic machine-learning ap-
proaches. The annotation does not stop here. The
annotator will go on identifying the arguments and
adjuncts for this predicate instance. For the argu-
ments, the annotator will determine which semantic
role each argument realizes, based on the set of pos-
sible roles for this frameset, and attach the appropri-
ate semantic role label (argN) to it. For adjuncts, the
annotator will determine the type of adjunct this is
and attach a secondary tag to argM.
5 Applications
A resource annotated with predicate-argument struc-
ture can be used for a variety of natural language
applications. For example, this level of abstraction
is useful for Information Extraction. The argument
role labels can be easily mapped to an Information
Extraction template, where each role is mapped to a
piece of information that an IE system is interested
in. Such mapping will not be as straightforward if
it is between surface syntactic entities such as the
subject and IE templates.
This level of abstraction can also provide a plat-
form where lexical transfer can take place. It opens
up the possibility of linking a frameset of a predi-
cate in one language with that of another, rather than
using bilingual (or multilingual) dictionaries where
one word is translated into one or more words in a
different language. This type of lexical transfer has
several advantages. One is that the transfer is made
more precise, in the sense that there will be more
cases where one-to-one mapping is possible. Even
in cases where one-to-one mapping is still not possi-
ble, the identification of the framesets of a predicate
will narrow down the possible lexical choices. For
example, sign.02 in the English Proposition Bank
(Kingsbury and Palmer, 2002) will be linked to “
.01/enter into an agreement”. This type of linking
rules out “ ” as a possible translation for sign.02,
even though it is a translation for other framesets of
the word sign.
The transfer will also be more precise in another
sense, that is, the predicate-argument structure of a
word instance will be preserved during the trans-
fer process. Knowing the arguments of a predicate
instance can further constrain the lexical choices
and rule out translation candidates whose predicate-
argument structures are incompatible. For example,
if the realized arguments of “sign.01” of the En-
glish Proposition Bank in a given sentence are the
signer, the document, and the signature, among the
translation candidates “ , ” (“ .01/enter into
an agreement” is ruled out as a possibility for this
frameset), only “ ” is possible, because “ ” can
only take two arguments, namely, the signer and the
document.
6. /he /at /this /CL /document /LC /sign
/LE /self /DE /name
“He signed his name on this document.”
One might argue that the syntactic subcategoriza-
tion frame obtained from the syntactic parse tree
can also constrain the lexical choices. For exam-
ple, knowing that “sign” has a subject, an object
and a prepositional phrase should be enough to rule
out “ ” as a possible translation. This argument
breaks down when there are lexical divergences.
The “document” argument of “ ” can only be re-
alized as a prepositional phrase in Chinese while
in English it can only be realized the direct object
of “sign”. If the syntactic subcategorization frame
is used to constrain the lexical choices for “sign”,
“ ” will be incorrectly ruled out as a possible
translation. There will be no such problem if the
more abstract predicate-argument structure is used
for this purpose. Even when the document is re-
alized as a prepositional phrase, it is still the same
argument. Of course, “ /sign” is also a possi-
ble translation. So compared with the surface syn-
tactic frames, the predicate-argument structure con-
strains the lexical choices without incorrectly ruling
out legitimate translation candidates. This is under-
standable because the predicate-structure abstracts
away from the syntactic idiosyncracies of the differ-
ent languages and thus are more transferable across
languages.
7. /he /at /this /CL /document /LC /sign
/he /sign /this /CL /document
“He signed this document.”
Annotating the predicate-argument structure as
described in previous sections will not reduce the
lexical choices to one-to-one mappings in call cases.
For example, “ ” can be translated into “standard-
ize” or “unite”, even though there is only one frame-
set for both finer senses of this verb. It is conceiv-
able that one might want to posit two framesets, each
for one finer sense of this verb. This is essentially
a trade-off: either one can conduct deep analysis
of the source language, resolve all sense ambigui-
ties on the source side and have a more straightfor-
ward mapping, or one takes the one-to-many map-
pings and select the correct translation on the tar-
get language side. Hopefully, the annotation of the
predicate-argument provides just the right level of
abstraction and the resource described here, with
each predicate annotated with its arguments and ad-
juncts in context, enables the automatic acquisition
of the predicate-argument structure.
6 Summary
In this paper, we described an approach to annotate
the propositions in the Penn Chinese Treebank. We
described how diathesis alternation patterns can be
used to make coarse sense distinctions for Chinese
verbs as a necessary step in annotating the predicate-
structure of predicates. We also described the repre-
sentation scheme we use to label the semantic argu-
ments and adjuncts of the predicates. We discussed
several complications for this type of annotation and
described our solutions. We then discussed how a
lexical database with predicate-argument structure
information can be used to ensure consistent annota-
tion. Finally, we discussed possible applications for
this resource.
7 Acknowledgement
This work is supported by MDA904-02-C-0412.

References
Daniel M. Bikel and David Chiang. 2000. Two statisti-
cal parsing models applied to the chinese treebank. In
Proceedings of the 2nd Chinese Language Processing
Workshop, Hong Kong, China.
Eugene Charniak. 2000. A Maximum-Entropy-Inspired
Parser. In Proc. of NAACL-2000.
Keh-Jiann Chen, Chu-Ren Huang, Feng-Yi Chen, Chi-
Ching Luo, Ming-Chung Chang, and Chao-Jan Chen.
to appear. Sinica Treebank: Design Criteria, rep-
resentational issues and immplementation. In Anne
Abeille, editor, Building and Using Syntactically An-
notated Corpora. Kluwer.
David Chiang. 2000. Statisitical parsing with an
automatically-extracted tree adjoining grammar. In
Proceedings of the 38th Annual Meeting of the Asso-
ciation for Computational Linguistics, pages 456-463,
Hong Kong.
Mike Collins. 1997. Three Generative, Lexicalised Mod-
els for Statistical Parsing. In Proc. of ACL-1997.
Mike Collins. 2000. Discriminative Reranking for Natu-
ral Language Parsing. In Proc. of ICML-2000.
N. Ide and J. Vronis. 1998. Word sense disambigua-
tion: The state of the art. Computational Linguistics,
24(1):1–40.
Paul Kingsbury and Martha Palmer. 2002. From tree-
bank to propbank. In Proceedings of the 3rd Interna-
tional Conference on Language Resources and Evalu-
ation (LREC2002), Las Palmas, Spain.
Beth Levin. 1993. English Verbs and Alternations: A
Preliminary Investigation. Chicago: The Unversity of
Chicago Press.
Charles Li and Sandra Thompson. 1976. Subject and
Topic: A new typology of language. In Charles Li,
editor, Subject and Topic. Academic Press.
M. Marcus, B. Santorini, and M. A. Marcinkiewicz.
1993. Building a Large Annotated Corpus of English:
the Penn Treebank. Computational Linguistics.
Martha Palmer, Hoa Trang Dang, and Christiane Fell-
baum. submitted. Making fine-grained and coarse-
grained sense distinctions, both manually and auto-
matically. Journal of Natural Language Engineering.
Fei Xia, Martha Palmer, Nianwen Xue, Mary Ellen
Okurowski, John Kovarik, Fu-Dong Chiou, Shizhe
Huang, Tony Kroch, and Mitch Marcus. 2000. Devel-
oping Guidelines and Ensuring Consistency for Chi-
nese Text Annotation. In Proc. of the 2nd Interna-
tional Conference on Language Resources and Evalu-
ation (LREC-2000), Athens, Greece.
