﻿Proceedings of the ACL-SIGLEX Workshop on Deep Lexical Acquisition, pages 1–9,
Ann Arbor, June 2005. c©2005 Association for Computational Linguistics
Data Homogeneity and Semantic Role Tagging in Chinese 
Oi Yee Kwong and Benjamin K. Tsou 
Language Information Sciences Research Centre 
City University of Hong Kong 
Tat Chee Avenue, Kowloon, Hong Kong 
{rlolivia, rlbtsou}@cityu.edu.hk 
 
 
Abstract 
This paper reports on a study of semantic 
role tagging in Chinese in the absence of a 
parser.  We tackle the task by identifying 
the relevant headwords in a sentence as a 
first step to partially locate the corre-
sponding constituents to be labelled.  We 
also explore the effect of data homogene-
ity by experimenting with a textbook cor-
pus and a news corpus, representing 
simple data and complex data respectively.  
Results suggest that while the headword 
location method remains to be improved, 
the homogeneity between the training and 
testing data is important especially in 
view of the characteristic syntax-
semantics interface in Chinese.  We also 
plan to explore some class-based tech-
niques for the task with reference to exist-
ing semantic lexicons, and to modify the 
method and augment the feature set with 
more linguistic input. 
1 Introduction 
As the development of language resources pro-
gresses from POS-tagged corpora to syntactically 
annotated treebanks, the inclusion of semantic in-
formation such as predicate-argument relations 
becomes indispensable.  The expansion of the Penn 
Treebank into a Proposition Bank (Kingsbury and 
Palmer, 2002) is a typical move in this direction.  
Lexical resources also need to be enhanced with 
semantic information (e.g. Fellbaum et al., 2001).  
The ability to identify semantic role relations cor-
rectly is essential to many applications such as in-
formation extraction and machine translation; and 
making available resources with this kind of in-
formation would in turn facilitate the development 
of such applications. 
Large-scale production of annotated resources 
is often labour intensive, and thus calls for auto-
matic labelling to streamline the process.  The task 
is essentially done in two phases, namely recognis-
ing the constituents bearing some semantic rela-
tionship to the target verb in a sentence, and then 
labelling them with the corresponding semantic 
roles. 
In their seminal proposal, Gildea and Jurafsky 
(2002) approached the task using various features 
such as headword, phrase type, and parse tree path.  
While such features have remained the basic and 
essential features in subsequent research, parsed 
sentences are nevertheless required, for extracting 
the path features during training and providing the 
argument boundaries during testing.  The parse 
information is deemed important for the perform-
ance of role labelling (Gildea and Palmer, 2002; 
Gildea and Hockenmaier, 2003). 
More precisely, parse information is rather 
more critical for the identification of boundaries of 
candidate constituents than for the extraction of 
training data.  Its limited function in training, for 
instance, is reflected in the low coverage reported 
(e.g. You and Chen, 2004).  As full parses are not 
always accessible, many thus resort to shallow syn-
tactic information from simple chunking, even 
though results often turn out to be less satisfactory 
than with full parses. 
This limitation is even more pertinent for the 
application of semantic role labelling to languages 
which do not have sophisticated parsing resources.  
In the case of Chinese, for example, there is con-
1
siderable variability in its syntax-semantics inter-
face; and when one comes to more nested and 
complex sentences such as those from news arti-
cles, it becomes more difficult to capture the sen-
tence structures by typical examples. 
Thus in the current study, we approach the 
problem in Chinese in the absence of parse infor-
mation, and attempt to identify the headwords in 
the relevant constituents in a sentence to be tagged 
as a first step.  In addition, we will explore the ef-
fect of training on different datasets, simple or 
complex, to shed light on the relative importance 
of parse information for indicating constituent 
boundaries in semantic role labelling. 
In Section 2, related work will be reviewed.  In 
Section 3, the data used in the current study will be 
introduced.  Our proposed method will be ex-
plained in Section 4, and the experiment reported 
in Section 5.  Results and future work will be dis-
cussed in Section 6, followed by conclusions in 
Section 7. 
 
2 Related Work 
The definition of semantic roles falls on a contin-
uum from abstract ones to very specific ones.  
Gildea and Jurafsky (2002), for instance, used a set 
of roles defined according to the FrameNet model 
(Baker et al., 1998), thus corresponding to the 
frame elements in individual frames under a par-
ticular domain to which a given verb belongs.  
Lexical entries (in fact not limited to verbs, in the 
case of FrameNet) falling under the same frame 
will share the same set of roles.  Gildea and Palmer 
(2002) defined roles with respect to individual 
predicates in the PropBank, without explicit nam-
ing.  To date PropBank and FrameNet are the two 
main resources in English for training semantic 
role labelling systems, as in the CoNLL-2004 
shared task (Carreras and Màrquez, 2004) and 
SENSEVAL-3 (Litkowski, 2004). 
The theoretical treatment of semantic roles is 
also varied in Chinese.  In practice, for example, 
the semantic roles in the Sinica Treebank mark not 
only verbal arguments but also modifier-head rela-
tions (You and Chen, 2004).  In our present study, 
we go for a set of more abstract semantic roles 
similar to the thematic roles for English used in 
VerbNet (Kipper et al., 2002).  These roles are 
generalisable to most Chinese verbs and are not 
dependent on particular predicates.  They will be 
further introduced in Section 3. 
Approaches in automatic semantic role label-
ling are mostly statistical, typically making use of 
a number of features extracted from parsed training 
sentences.  In Gildea and Jurafsky (2002), the fea-
tures studied include phrase type (pt), governing 
category (gov), parse tree path (path), position of 
constituent with respect to the target predicate (po-
sition), voice (voice), and headword (h).  The la-
belling of a constituent then depends on its 
likelihood to fill each possible role r given the fea-
tures and the target predicate t, as in the following, 
for example: 
 
),,,,,|( tvoicepositiongovpthrP    
 
Subsequent studies exploited a variety of im-
plementation of the learning component.  Trans-
formation-based approaches were also used (e.g. 
see Carreras and Màrquez (2004) for an overview 
of systems participating in the CoNLL shared task).  
Swier and Stevenson (2004) innovated with an un-
supervised approach to the problem, using a boot-
strapping algorithm, and achieved 87% accuracy. 
While the estimation of the probabilities could 
be relatively straightforward, the trick often lies in 
locating the candidate constituents to be labelled.  
A parser of some kind is needed.  Gildea and 
Palmer (2002) compared the effects of full parsing 
and shallow chunking; and found that when con-
stituent boundaries are known, both automatic 
parses and gold standard parses resulted in about 
80% accuracy for subsequent automatic role tag-
ging, but when boundaries are unknown, results 
with automatic parses dropped to 57% precision 
and 50% recall.  With chunking only, performance 
further degraded to below 30%.  Problems mostly 
arise from arguments which correspond to more 
than one chunk, and the misplacement of core ar-
guments.  Sun and Jurafsky (2004) also reported a 
drop in F-score with automatic syntactic parses 
compared to perfect parses for role labelling in 
Chinese, despite the comparatively good results of 
their parser (i.e. the Collins parser ported to Chi-
nese).  The necessity of parse information is also 
reflected from recent evaluation exercises.  For 
instance, most systems in SENSEVAL-3 used a 
parser to obtain full syntactic parses for the sen-
tences, whereas systems participating in the 
CoNLL task were restricted to use only shallow 
2
syntactic information.  Results reported in the for-
mer tend to be higher.  Although the dataset may 
be a factor affecting the labelling performance, it 
nevertheless reinforces the usefulness of full syn-
tactic information. 
According to Carreras and Màrquez (2004), for 
English, the state-of-the-art results reach an F
1
 
measure of slightly over 83 using gold standard 
parse trees and about 77 with real parsing results.  
Those based on shallow syntactic information is 
about 60. 
In this work, we study the problem in Chinese, 
treating it as a headword identification and label-
ling task in the absence of parse information, and 
examine how the nature of the dataset could affect 
the role tagging performance. 
3 The Data 
3.1 Materials 
In this study, we used two datasets: sentences from 
primary school textbooks were taken as examples 
for simple data, while sentences from a large cor-
pus of newspaper texts were taken as complex ex-
amples. 
Two sets of primary school Chinese textbooks 
popularly used in Hong Kong were taken for refer-
ence.  The two publishers were Keys Press and 
Modern Education Research Society Ltd.  Texts 
for Primary One to Six were digitised, segmented 
into words, and annotated with parts-of-speech 
(POS).  This results in a text collection of about 
165K character tokens and upon segmentation 
about 109K word tokens (about 15K word types).  
There were about 2,500 transitive verb types, with 
frequency ranging from 1 to 926. 
The complex examples were taken from a sub-
set of the LIVAC synchronous corpus
1
 (Tsou et al., 
2000; Kwong and Tsou, 2003).   The subcorpus 
consists of newspaper texts from Hong Kong, in-
cluding local news, international news, financial 
news, sports news, and entertainment news, col-
lected in 1997-98.  The texts were segmented into 
words and POS-tagged, resulting in about 1.8M 
character tokens and upon segmentation about 1M 
word tokens (about 47K word types).  There were 
about 7,400 transitive verb types, with frequency 
ranging from 1 to just over 6,300. 
                                                           
                                                          
1
 http://www.livac.org 
3.2 Training and Testing Data 
For the current study, a set of 41 transitive verbs 
common to the two corpora (hereafter referred to 
as textbook corpus and news corpus), with fre-
quency over 10 and over 50 respectively, was 
sampled.   
Sentences in the corpora containing the sam-
pled verbs were extracted.  Constituents corre-
sponding to semantic roles with respect to the 
target verbs were annotated by a trained human 
annotator and the annotation was verified by an-
other.  In this study, we worked with a set of 11 
predicate-independent abstract semantic roles. 
According to the Dictionary of Verbs in Contem-
porary Chinese (Xiandai Hanyu Dongci Dacidian, 
現代漢語動詞大詞典 – Lin et al., 1994), our se-
mantic roles include the necessary arguments for 
most verbs such as agent and patient, or goal and 
location in some cases; and some optional argu-
ments realised by adjuncts, such as quantity, in-
strument, and source.  Some examples of semantic 
roles with respect to a given predicate are shown in 
Figure 1. 
Altogether 980 sentences covering 41 verb 
types in the textbook corpus were annotated, re-
sulting in 1,974 marked semantic roles (constitu-
ents); and 2,122 sentences covering 41 verb types 
in the news corpus were annotated, resulting in 
4,933 marked constituents
2
. 
The role labelling system was trained on 90% 
of the sample sentences from the textbook corpus 
and the news corpus separately; and tested on the 
remaining 10% of both corpora.   
4 Automatic Role Labelling 
The automatic labelling was based on the statistical 
approach in Gildea and Jurafsky (2002).  In Sec-
tion 4.1, we will briefly mention the features used 
in the training process.  Then in Sections 4.2 and 
4.3, we will explain our approach for locating 
headwords in candidate constituents associated 
with semantic roles, in the absence of parse infor-
mation. 
 
2
 These figures only refer to the samples used in the current 
study.  In fact over 35,000 sentences in the LIVAC corpus 
have been semantically annotated, covering about 1,500 verb 
types and about 80,000 constituents were marked. 
3
4.1 Training 
In this study, our probability model was based 
mostly on parse-independent features extracted 
from the training sentences, namely: 
 
Headword (head): The headword from each con-
stituent marked with a semantic role was identified.  
For example, in the second sentence in Figure 1, 
學校 (school) is the headword in the constituent 
corresponding to the agent of the verb 舉行 (hold), 
and 比賽 (contest) is the headword of the noun 
phrase corresponding to the patient. 
 
Position (posit): This feature shows whether the 
constituent being labelled appears before or after 
the target verb.  In the first example in Figure 1, 
the experiencer and time appear on the left of the 
target, while the theme is on its right. 
 
POS of headword (HPos): Without features pro-
vided by the parse, such as phrase type or parse 
tree path, the POS of the headword of the labelled 
constituent could provide limited syntactic infor-
mation. 
 
Preposition (prep): Certain semantic roles like 
time and location are often realised by preposi-
tional phrases, so the preposition introducing the 
relevant constituents would be an informative fea-
ture. 
 
Hence for automatic labelling, given the target 
verb t, the candidate constituent, and the above 
features, the role r which has the highest probabil-
ity for P(r | head, posit, HPos, prep, t) will be as-
signed to that constituent.  In this study, however, 
we are also testing with the unknown boundary 
condition where candidate constituents are not 
available in advance.  To start with, we attempt to 
partially locate them by identifying their head-
words first, as explained in the following sections.  
 
 
 
Figure 1  Examples of semantic roles with respect to a given predicate 
 
 
4.2 Locating Candidate Headwords 
In the absence of parse information, and with con-
stituent boundaries unknown, we attempt to par-
tially locate the candidate constituents by 
identifying their corresponding headwords first.  
Sentences in our test data were segmented into 
words and POS-tagged.  We thus divide the recog-
nition process into two steps, locating the head-
word of a candidate constituent first, and then 
expanding from the headword to determine its 
boundaries. 
Student 
下 星期 學校 舉行 講 故事 比賽 
Next week school hold tell story contest 
Time Agent Target Patient 
Example: (Next week, the school will hold a story-telling contest.) 
同學 們 作文 常常 
感到沒 可 
(-pl) write essay always feel (neg) anything 
Experiencer Target Theme
Example: (Students always feel there is nothing to write about for their essays.) 
時   ，
time 
什麼 寫 
can 
Time 
write 
4
Basically, if we consider every word in the 
same sentence with the target verb (both to its left 
and to its right) a potential headword for a candi-
date constituent, what we need to do is to find out 
the most probable words in the sentence to match 
against individual semantic roles.  We start with a 
feature set with more specific distributions, and 
back off to feature sets with less specific distribu-
tions
3
.  Hence in each round we look for 
 
)|(maxarg setfeaturerP
r
 
 
for every candidate word.  Ties are resolved by 
giving priority to the word nearest to the target 
verb in the sentence. 
Figure 2 shows an example illustrating the pro-
cedures for locating candidate headwords.  The 
target verb is 發現 (discover).  In the first round, 
using features head, posit, HPos, and t, 時候 (time) 
and 問題 (problem) were identified as Time and 
Patient respectively.  In the fourth subsequent 
round, backing off with features posit and HPos, 
我們 (we) was identified as a possible Agent.  In 
this round a few other words were identified as 
potential Patients.  However, they would not be 
considered since Patient was already located in a 
previous round.  So in the end the headwords iden-
tified for the test sentence are 我們 for Agent, 問
題 for Patient and 時候 for Time. 
4.3 Constituent Boundary 
Upon the identification of headwords for potential 
constituents, the next step is to expand from these 
headwords for constituent boundaries.  Although 
we are not doing this step in the current study, it 
can potentially be done via some finite state tech-
niques, or better still, with shallow syntactic proc-
essing like simple chunking if available. 
                                                           
3
 In this experiment, we back off in the following order: 
P(r|head, posit, HPos, prep t), P(r|head, posit, t), P(r | head, t), 
P(r | HPos, posit, t), P(r | HPos, t).  However, the prep feature 
becomes obsolete when constituent boundaries are unknown. 
5 The Experiment 
5.1 Testing 
The system was trained on the textbook corpus and 
the news corpus separately, and tested on both cor-
pora (the data is homogeneous if the system is 
trained and tested on materials from the same 
source).  The testing was done under the “known 
constituent” condition and “unknown constituent” 
condition.  The former essentially corresponds to 
the known-boundary condition in related studies; 
whereas in the unknown-constituent condition, 
which we will call “headword location” condition 
hereafter, we tested our method of locating candi-
date headwords as explained above in Section 4.2.  
In this study, every noun, verb, adjective, pronoun, 
classifier, and number within the test sentence con-
taining the target verb was considered a potential 
headword for a candidate constituent correspond-
ing to some semantic role.  The performance was 
measured in terms of the precision (defined as the 
percentage of correct outputs among all outputs), 
recall (defined as the percentage of correct outputs 
among expected outputs), and F
1
 score which is the 
harmonic mean of precision and recall. 
5.2 Results 
The results are shown in Tables 1 and 2, for train-
ing on homogeneous dataset and different dataset 
respectively, and testing under the known constitu-
ent condition and the headword location condition. 
When trained on homogeneous data, the results 
were good on both datasets under the known con-
stituent condition, with an F
1
 score of about 90.  
This is comparable or even better to the results re-
ported in related studies for known boundary con-
dition.  The difference is that we did not use any 
parse information in the training, not even phrase 
type.  When trained on a different dataset, however, 
the accuracy was maintained for textbook data, but 
it decreased for news data, for the known constitu-
ent condition. 
For the headword location condition, the per-
formance in general was expectedly inferior to that 
for the known constituent condition.  Moreover, 
this degradation seemed to be quite consistent in 
most cases, regardless of the nature of the training 
set.  In fact, despite the effect of training set on 
news data, as mentioned above, the degradation 
5
Sentence: 
溫習的時候，我們發現了許多平時沒有想到，或是未能解決的問題，於是就去問爸爸
During revision, we discover a lot o
。 
f problems which we have not thought of or cannot be 
solved, then we go and ask father. 
Candidate  Round 1       … Round 4    Final Result 
eadwords 
n) 
H
 
溫習 (revisio    Patient
時候 (time)  Time            ----       Time 
我們 (we)    Agent   Agent 
k)    Patient
平時 (normally) 
想到 (thin
能 (can) 
解決 (solve)    Patient
問題 (problem)  Patient    ----       Patient 
去 (go)     Patient
問(ask) Patient
from known constituent to headword location is 
nevertheless the least fo
爸爸 (father)    Patient
r news data when trained 
on 
remature at this stage, given the considerable dif-
 
 
 
Figure 2  Example illustrating the procedures for locating candidate headwords 
  
 
Tex ata News a 
different materials.   
Hence the effect of training data is only obvious 
in the news corpus.  In other words, both sets of 
training data work similarly well with textbook test 
data, but the performance on news test data is 
worse when trained on textbook data.  This is un-
derstandable as the textbook data contain fewer 
examples and the sentence structures are usually 
much simpler than those in newspapers.  Hence the 
system tends to miss many secondary roles like 
location and time, which are not sufficiently repre-
sented in the textbook corpus.  The conclusion that 
training on news data gives better result might be 
ference in the corpus size of the two datasets.  
Nevertheless, the deterioration of results on text-
book sentences, even when trained on news data, is 
simply reinforcing the importance of data homoge-
neity, if nothing else.  More on data homogeneity 
will be discussed in the next section. 
p
In addition, the surprisingly low precision under 
the headword location condition is attributable to a 
technical inadequacy in the way we break ties.  In 
this study we only make an effort to eliminate mul-
tiple tagging of the same role to the same target 
verb in a sentence on either side of the target verb, 
but not if they appear on both sides of the target 
verb.  This should certainly be dealt with in future 
experiments. 
 
 
 
 
tbook D Dat
P  Precisionrecision Recall F
1
Recall F
1
Known Constituent 93.85 87.50 90.56 90.49 87.70 89.07 
Headword Location 46.12 61.98 52.89 38.52 52.25 44.35 
 
Table 1  Results for Training on Homogeneous Datasets 
 
 
6
 
Tex ata News a tbook D Dat
P  Precisionrecision Recall F
1
Recall F
1
Known Constituent 91.85 88.02 89.86 80.30 66.80 72.93 
Headword Location 38.87 57.29 46.32 37.89 42.01 39.84 
 
Table 2  Results for Training on Different Datasets 
 
6 Discussion 
 
hen
cuss this below in relation to 
n’, duration as in 
吃三
d the parse information 
wo
verb 進行, being very 
pol
he design 
he feature set should benefit 
 m nalysis and input. 
 
6.1 Role of Parse Information 
According to Carreras and Màrquez (2004), the 
state-of-the-art results for semantic role labelling 
systems based on shallow syntactic information is 
about 15 lower than those with access to gold stan-
dard parse trees, i.e., around 60.  With homogene-
ous training and testing data, our experimental 
results for the headword location condition, with 
no syntactic information available at all, give an F
1
 
score of 52.89 and 44.35 respectively for textbook 
data and news data.  Such results are in line with 
and comparable to those reported for the unknown 
boundary condition with automatic parses in 
Gildea and Palmer (2002), for instance.  Moreover, 
when they used simple chunks instead of full 
parses, the performance resulted in a drop to below 
50% precision and 35% recall with relaxed scoring,
ce their conclusion on the necessity of a parser. 
The more degradation in performance observed 
in the news data is nevertheless within expectation, 
and it suggests that simple and complex data seem 
to have varied dependence on parse information.  
We will further dis
data homogeneity. 
6.2 Data Homogeneity 
The usefulness of parse information for semantic 
role labelling is especially interesting in the case of 
Chinese, given the flexibility in its syntax-
semantics interface (e.g. the object after 吃 ‘eat’ 
could refer to the patient as in 吃蘋果 ‘eat apple’, 
location as in 吃食堂 ‘eat cantee
年 ‘eat three years’, etc.).   
  As reflected from the results, the nature of 
training data is obviously more important for the 
news data than the textbook data; and the main 
reason might be the failure of the simple training 
data to capture the many complex structures of the 
news sentences, as we suggested earlier.  The rela-
tive flexibility in the syntax-semantics interface of 
Chinese is particularly salient; hence when a sen-
tence gets more complicated, there might be more 
intervening constituents an
uld be useful to help identify the relevant ones 
in semantic role labelling. 
With respect to the data used in the experiment, 
we tried to explore the complexity in terms of the 
average sentence length and number of semantic 
role patterns exhibited.  For the news data, the av-
erage sentence length is around 59.7 characters 
(syllables), and the number of semantic role pat-
terns varies from 4 (e.g. 打算 ‘to plan’) to as many 
as 25 (e.g. 進行 ‘to proceed with some action’), 
with an average of 9.5 patterns per verb.  On the 
other hand, the textbook data give an average sen-
tence length of around 39.7 characters, and the 
number of semantic role patterns only varies from 
1 (e.g. 決定 ‘to decide’) to 11 (e.g. 舉行 ‘to hold 
some event’), with an average of 5.1 patterns per 
verb.  Interestingly, the 
ymorphous in news texts, only shows 5 differ-
ent patterns in textbooks. 
Thus the nature of the dataset for semantic role 
labelling is worth further investigation.  T
of the method and t
from ore linguistic a
6.3 Future Work 
In terms of future development, apart from improv-
ing the handling of ties in our method, as men-
tioned above, we plan to expand our work in 
several respects.  The major part would be on the 
generalization to unseen headwords and unseen 
predicates.  As is with other related studies, the 
examples available for training for each target verb 
are very limited; and the availability of training 
data is also insufficient in the sense that we cannot 
expect them to cover all target verb types.  Hence 
7
it is very important to be able to generalize the 
process to unseen words and predicates.  To this 
end we will experiment with a semantic lexicon 
like Tongyici Cilin (同義詞詞林, a Chinese the-
sau
re of 
Chinese, we intend to improve our method and 
re linguistic consideration. 
 
 
 semantic lexicons, 
and to modify the method and augment the feature 
set with more linguistic input. 
This work is supported by Competitive Earmarked 
Research Grants (CERG) of the Research Grants 
Hong Kong under grant Nos. 
R
Ba
 the 
Ca troduction to the 
Fe
Resources, Invited Talk, Pittsburg, PA. 
Gi D. and Palmer, M. (2002)  The Necessity of 
Gi kenmaier, J. (2003)  Identifying Se-
Kw
r Part-of-speech 
Tagging. In Proceedings of the Research Note Ses-
sion of the 10th Conference of the European Chapter 
rus) in both training and testing, which we ex-
pect to improve the overall performance. 
Another area of interest is to look at the behav-
iour of near-synonymous predicates in the tagging 
process.  Many predicates may be unseen in the 
training data, but while the probability estimation 
could be generalized from near-synonyms as sug-
gested by a semantic lexicon, whether the similar-
ity and subtle differences between near-synonyms 
with respect to the argument structure and the cor-
responding syntactic realisation could be distin-
guished would also be worth studying.  Related to 
this is the possibility of augmenting the feature set.  
Xue and Palmer (2004), for instance, looked into 
new features such as syntactic frame, lexicalized 
constituent type, etc., and found that enriching the 
feature set improved the labelling performance.  In 
particular, given the importance of data homogene-
ity as observed from the experimental results, and 
the challenges posed by the characteristic natu
feature set with mo
7 Conclusion 
The study reported in this paper has thus tackled 
semantic role labelling in Chinese in the absence of 
parse information, by attempting to locate the cor-
responding headwords first.  We experimented 
with both simple and complex data, and have ex-
plored the effect of training on different datasets. 
Using only parse-independent features, our results 
under the known boundary condition are compara-
ble to those reported in related studies.  The head-
word location method can be further improved. 
More importantly, we have observed the impor-
tance of data homogeneity, which is especially sa-
lient given the relative flexibility of Chinese in its 
syntax-semantics interface.  As a next step, we 
plan to explore some class-based techniques for the 
task with reference to existing
 
Acknowledgements 
Council of 
CityU1233/01H and CityU1317/03H. 
 
References 
Baker, C.F., Fillmore, C.J. and Lowe, J.B. (1998)  The 
Berkeley FrameNet Project.  In Proceedings of
36th Annual Meeting of the Association for Computa-
tional Linguistics and the 17th International Confer-
ence on Computational Linguistics (COLING-
ACL ’98), Montreal, Quebec, Canada, pp.86-90. 

Carreras, X. and Màrquez, L. (2004)  In
CoNLL-2004 Shared Task: Semantic Role Labeling.  
In Proceedings of the Eighth Conference on Compu-
tational Natural Language Learning (CoNLL-2004), 
Boston, Massachusetts, pp.89-97. 

Fellbaum, C., Palmer, M., Dang, H.T., Delfs, L. and 
Wolf, S. (2001)  Manual and Automatic Semantic 
Annotation with WordNet.  In Proceedings of the 
NAACL-01 SIGLEX Workshop on WordNet and 
Other Lexical 

Gildea, D. and Jurafsky, D. (2002)  Automatic Labeling 
of Semantic Roles.  Computational Linguistics, 28(3): 
245-288. 

Gildea, D. and Palmer, M. 2002. The necessity of 
Parsing for Predicate Argument Recognition.  In Pro-
ceedings of the 40th Meeting of the Association for 
Computational Linguistics (ACL-02), Philadelphia, 
PA. 

Gildea, D. and Hockenmaier, J. 2003. Identifying Semantic Roles Using Combinatory Categorial Gram-
mar.  In Proceedings of the 2003 Conference on 
Empirical Methods in Natural Language Processing, 
Sapporo, Japan. 

Kingsbury, P. and Palmer, M. (2002)  From TreeBank 
to PropBank.  In Proceedings of the Third Confer-
ence on Language Resources and Evaluation (LREC-
02), Las Palmas, Canary Islands, Spain. 

Kipper, K., Palmer, M. and  Rambow, O. (2002)  Ex-
tending PropBank with VerbNet Semantic Predicates.  
In Proceedings of the AMTA-2002 Workshop on Ap-
plied Interlinguas, Tiburon, CA. 

Kwong, O.Y. and Tsou, B.K. (2003) Categorial Fluidity 
in Chinese and its Implications for part-of-speech tagging.
EACL.

Lin, X., Wang, L. and Sun, D. Li (1994)  Dictionary of 
Verbs in Contemporary Chinese.  Beijing Language 
and Culture University Press. 

Litkowski, K.C. (2004) SENSEVAL-3 Task: Automatic 
Labeling of Semantic Roles.  In Proceedings of the 
Third International Workshop on the Eval
Systems for the Semantic Analysis of Text 
SENSEVAL-3), Barcelona, Spain, pp.9-12. 

Sun, H. and Jurafsky, D. (2004)  Shallow Semantic 
Parsing of Chinese.  HLT NAACL 2004.

Swier, R.S. and Stevenson, S. (2004)  Unsupervised 
Semantic Role Labelling.  In Proceedings of the 
2004 Conference on Empirical Methods in Natural 
Language Processing, Barcelona, Spain, pp.95-102. 

Tsou, B.K., Tsoi, W.F., Lai, T.B.Y., Hu, J. and Chan, 
S.W.K. (2000)  LIVAC, A Chinese Synchronous 
Corpus, and Some Applications.  In Proceedings of 
the ICCLC International Conference on Chinese 
Language Computing, Chicago, pp. 233-238. 

Xue, N. and Palmer, M. (2004)  Calibrating Features for 
Semantic Role Labeling.  In Proceedings of the 
Conference on Empirical Methods in Natural Lan-
guage Processing, Barcelona, Spain, pp.88-94. 

You, J-M. and Chen, K-J. (2004)  Automatic Semantic 
Role Assignment for a Tree Structure.  In Proceed-
ings of the 3rd SigHAN Workshop on Chinese Lan-
guage Processing, ACL-04.

Qisi Zhongguo Yuwen.  Primary 1-6, 24 
volumes, 2004.  Hong Kong: Keys Press. 

Xiandai Zhonguo Yuwen. Primary 1-6, 24 volumes. 2004. Hong Kong: modern education research society.
