Chinese Syntactic Parsing Based on Extended GLR Parsing 
Algorithm with PCFG* 
 
Yan Zhang, Bo Xu and Chengqing Zong 
National Laboratory of Pattern Recognition, Institute of Automation 
Chinese Academy of sciences, Beijing 100080, P. R. China 
E-mail: {yzhang, xubo, cqzong}@nlpr.ia.ac.cn 
 
Abstract  
This paper presents an extended GLR 
parsing algorithm with grammar PCFG* that 
is based on Tomita’s GLR parsing algorithm 
and extends it further. We also define a new 
grammar—PCFG* that is based on PCFG 
and assigns not only probability but also 
frequency associated with each rule. So our 
syntactic parsing system is implemented 
based on rule-based approach and statistics 
approach. Furthermore our experiments are 
executed in two fields: Chinese base noun 
phrase identification and full syntactic 
parsing. And the results of these two fields 
are compared from three ways. The 
experiments prove that the extended GLR 
parsing algorithm with PCFG* is an 
efficient parsing method and a 
straightforward way to combine statistical 
property with rules. The experiment results 
of these two fields are presented in this 
paper. 
1. Introduction 
Recently the syntactic parsing system is one of 
significant components in natural language 
processing. Many parsing methods have been 
developed as the development of corpus 
linguistics and applications of linguistics. 
Tomita’ GLR parsing (Tomita M., 1986, 1987) 
is the most general shift-reduce method of 
bottom-up parsing and widely used in syntactic 
parsing. Several methods are based on it. Lavie 
(Lavie A., 1996) used the GLR* parsing 
algorithm for spoken language system. It uses a 
finite-state probabilistic model to compute the 
action probabilities. Inui (Inui K. et al., 1997, 
1998) presented a formalization of probabilistic 
GLR (PGLR) parsing model which assigns a 
probability to each LR parsing action. To 
shallow parsing, many researchers have made 
experiments with identification of noun phrases. 
Abney (Abney S., 1991) used two level 
grammar rules to implement the noun phrase 
parsing through pure LR parsing algorithm.  
Some new methods based on GLR algorithm 
aim to capture action probabilities by statistics 
distribution and context relations. This paper 
combines rule approach and statistics approach 
simultaneously. Furthermore, based on GLR and 
PCFG, we present an extended GLR parsing and 
a new grammar PCFG* that provides the action 
probabilities to prune the meaningless branches 
in the parsing table. Our experiments are also 
made in two parts: Chinese base noun phrase 
parsing and Chinese full parsing. The former is a 
simplified formalization of full parsing and is 
relatively simpler than the latter. 
This paper includes four sections. Section 2 
presents a brief description of rule structure 
system-PCFG*. Section 3 gives our extended 
GLR parsing algorithm and the parsing 
processing. Section 4 shows the experiment 
results of our parser including Chinese base 
noun phrases (baseNP) identification and 
Chinese full syntactic parser. The conclusions 
are drawn in section 5. 
2. A New Grammar (PCFG*) and the 
Rule Structure 
Grammar system is one of the important pars of 
a parsing system. We explain it in detail in the 
following section.  
2.1 Structure of Rules 
The definition of symbols in our system inherits 
the classifications of Penn Chinese tree-bank 
(Xia F., 2000). There are totally 33 
