Proceedings of the 
Second Chinese Language 
Processing Workshop 
Held in conjunction with 
The 38th Annual Meeting of the 
Association for Computational Linguistics 
Edited by 
Martha Palmer 
Mitch Marcus 
Aravind Joshi 
Fei Xia 
8 October 2000 
Hong Kong University of Science and Technology (HKUST) 
Hong Kong 
Proceedings of the 
Second Chinese Language 
Processing Workshop 
Held in conjunction with 
The 38th Annual Meeting of the 
Association for Computational Linguistics 
Edited by 
Martha Palmer 
Mitch Marcus 
Aravind Joshi 
Fei Xia 
8 October 2000 
Hong Kong University of Science and Technology (HKUST) 
Hong Kong 
Q2000 The Association for Computational Linguistics 
Order copies of this and other ACL workshop proceedings from: 
Association for Computational Linguistics (ACL) 
75 Paterson Street, Suite 9 
New Brunswick, NJ 08901 
USA 
Tel: +1-732-342-9100 
Fax: +1-732-342-9339 
acl@aclweb, org 
SPONSORS: 
SIGDAT 
SIGLEX 
SIGPARSE 
INDUSTRY SPONSOR: 
Intel China Research Center 
INVITED SPEAKER: 
Chin-Chuan Cheng (City University of Hong Kong) 
ORGANIZERS: 
Martha Palmer 
Mitch Marcus 
Aravind Joshi 
Fei Xia 
PROGRAM COMMITTEE: 
Srinivas Bangalore 
Keh-Jiann Chen 
Zhengdong Dong 
Shengli Feng 
Kok Wee Gan 
Laurie Gerber 
Changning Huang 
Chu-Ren Huang 
Wanying Jin 
Kim-Teng Lua 
John Kovarik 
K.L. Kwok 
Mary Ellen Okurowski 
Fuji Ren 
Richard Sproat 
(AT&T Research Labs) 
(Academia Sinica) 
(Hownet) 
(Univ. of Kansas) 
(HKUST) 
(Univ. of Southern CA/ISI) 
(Microsoft Research China) 
(Academia Sinica) 
(New Mexicon State Univ.) 
(National Univ. of Singapore) 
(Department of Defense) 
(Queens College) 
(Department of Defense) 
(Hiroshima City Univ.) 
(AT&T Research Lab) 
FURTHER INFORMATION: 
Martha Palmer, Mitch Marcus, Aravind Joshi, and Fei Xia 
Department of Computer and Information Science 
University of Pennsylvania 
Philadelphia, PA 19104, USA 
Emaih {mpalmer, mitch,joshi,fxia} @ linc.cis.upenn.edu 
ACKNOWLEDGMENT: 
Keh-Yuh Su 
Maosong Sun 
Chew Lim Tan 
Benjamin K Tsou 
Amy Weinberg 
Ralph Weischedel 
Andi Wu 
Dekai Wu 
Nianwen Xue 
Jin Yang 
Shiwen Yu 
Chunfa Yuan 
Joe Zhou 
Qiang Zhou 
(Behavior Design Corporation) 
(Tsinghua Univ.) 
(National Univ. of Singapore) 
(City Univ. of Hong Kong) 
(Univ. of Maryland) 
(BBN) 
(Microsoft) 
(HKUST) 
(Univ. of Delaware) 
(Systran) 
(Peking Univ.) 
(Tsinghua Univ.) 
(Intel China Research Center) 
(Tsinghua Univ.) 
Special thanks to Yen-Lin Yin at the Univ. of Pennsylvania for helping us to organize this workshop. 
WORKSHOP PROGRAM 
Sunday, 8 October 2000 
8:45-9:00 Welcome 
9:00-9:50 Invited Talk: Zero Anaphors in Chinese Discourse Processing 
Chin-Chuan Cheng 
9:50-10:10 
10:10-10:30 
10:30-11:00 
11:00-t1:20 
11:20-11:40 
11:40-12:00 
12:00-12:20 
12:20-1:30 
1:30-2:30 
2:30-2:50 
2:50-3:10 
3:10-3:30 
3:30-4:00 
4:00-4:20 
4:20-4:40 
4:40-5:00 
5:00-6:00 
Sense-Tagging Chinese Corpus 
Hsin-Hsi Chen and Chi-Ching Lin 
Enhancement of a Chinese Discourse Marker Tagger with C4.5 
Benjamin K. T'sou, Tom B.Y. Lai, Samuel W.K. Chan, Weijun Gao and Xuegang Zhan 
Break 
Two Statistical Parsing Models Applied to the Chinese Treebank 
Daniel M. Bikel and David Chiang 
Using Co-occurence Statistics as an Ir~formation Source for Partial Parsing of Chinese 
Elliott Franco Drabek and Qiang Zhou 
Statistics Based Hybrid Approach to Chinese Base Phrase Identification 
Tie-jun Zhao, Mu-yun Yang, Fang Liu, Jian-min Yao and Hao Yu 
A Block-Based Robust Dependency Parser for Unrestricted Chinese Text 
Ming Zhou 
Lunch 
Poster session 
Knowledge Extraction for Identification of Chinese Organization Names 
Keh-Jiann Chen and Chao-jan Chen 
Statistically-Enhanced New Word Identification in a Rule-Based Chinese System 
Andi Wu and Zixin Jiang 
A Trainable Method for Extracting Chinese Entity Names and Their Relations 
Yimin Zhang and Joe F Zhou 
Break 
Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface 
Chu-Ren Huang, Feng-Yi Chen, Keh-Jiann Chen, Zhao-ming Gao and Kuang-Yu Chen 
Comparing Lexicalized Treebank Grammars Extracted from Chinese, Korean, and English 
Fei Xia, Chunghye Han, Martha Palmer and Aravind Joshi 
The Research of Word Sense Disambiguation Method Based on Co-occurrence 
Frequency of Hownet 
Erhong Yang, Guoqing Zhang and Yongkui Zhang 
Panel: Prioritizing Chinese Language Processing Resources 
Keh-Jiann Chen, Chu-Ren Huang, Bing Swen, Benjamin K. T'sou, 
Joe Zhou and Ming Zhou 
ii 
Table of Contents 
PRESENTATIONS 
Two Statistical Parsing Models Applied to the Chinese Treebank 
Daniel M. Bikel and David Chiang .................................................... 1 
Sense-Tagging Chinese Corpus 
Hsin-Hsi Chen and Chi-Ching Lin ........... • .......................................... 7 
Knowledge Extraction for Identification of Chinese Organization Names 
Keh-Jiann Chen and Chao-jan Chen ................................................. 15 
Using Co-occurence Statistics as an Information Source for Partial Parsing of Chinese 
Elliott Franco Drabek and Qiang Zhou ............................................... 22 
Sinica Treebank: Design Criteria, Annotation Guidelines, and On-line Interface 
Chu-Ren Huang, Feng-Yi Chen, Keh-Jiann Chen, Zhao-ming Gao 
and Kuang-Yu Chen ................................................................. 29 
Enhancement of a Chinese Discourse Marker Tagger with C4.5 
Benjamin K. T'sou, Tom B.Y. Lai, Samuel W.K. Chan, Weijun Gao 
and Xuegang Zhan ................................................................... 38 
Statistically-Enhanced New Word Identification in a Rule-Based Chinese System 
Andi Wu and Zixin Jiang ............................................................ 46 
Comparing Lexicalized Treebank Grammars Extracted from Chinese, Korean, 
and English Corpora 
Fei Xia, Chunghye Han, Martha Palmer and Aravind Joshi ........................... 52 
The Research of Word Sense Disambiguation Method Based on Co-occurrence 
Frequency of Hownet 
Erhong Yang, Guoqing Zhang, and Yongkui Zhang ................................... 60 
A Trainable Method for Extracting Chinese Entity Names and Their Relations 
Yimin Zhang and Joe F Zhou ........................................................ 66 
Statistics Based Hybrid Approach to Chinese Base Phrase Identification 
Tie-jun Zhao, Mu-yun Yang, Fang Liu, Jian-min Yao and Hao Yu .................... 73 
A Block-Based Robust Dependency Parser for Unrestricted Chinese Text 
Ming Zhou ........................................................................... 78 
POSTERS 
Annotating Information Structures in Chinese Texts Using HowNet 
Kok Wee Gan and Ping Wai Wong ................................................... 85 
Machine Learning Methods for Chinese Web Page Categorization 
Ji He, Ah-Hwee Tan and Chew-Lira Tan ............................................. 93 
°.. 
III 
Semantic Annotation of Chinese Phrases Using Recursive Graph 
Donghong Ji ........................................................................ 101 
Text Meaning Representation for Chinese 
Wanying Jin ........................................................................ 109 
How Should a Large Corpus Be Built?-A Comparative Study of Closure in Annotated 
Newspaper Corpora .from Two Chinese Sources, Towards Building a Larger 
Representative Corpus Merged from Representative Sub Collections 
John J. Kovarik ..................................................................... 116 
A Clustering Algorithm for Chinese Adjectives and Nouns 
Yang Wen, Chunfa Yuan and Changning Huang ..................................... 124 
Extraction of Chinese Compound Words - An Experimental Study on a Very Large Corpus 
Jian Zhang, Jianfeng Gao and Ming Zhou ........................................... 132 
An Algorithm for Situation Classification of Chinese Verbs 
Xiaodan Zhu, Chunfa Yuan, K.F. Wong and Wenjie Li .............................. 140 
INVITED TALK 
Zero Anaphors in Chinese Discourse Processing 
Chin-Chuan Cheng ................................................................. 146 
iv 
