Complex Features in Description of Chinese Language 
Feng Zhiwei 
Imtitute of Applied Linguistics 
Chinese Academy of Social Sciences 
51 Chaoyangmen Nanxiaojie 
100010 Beijing, China 
Abstract 
In this paper, the similarity of" multi-vahw label fimction" and "complex features" iv discussed. The author especially era. 
phasizes the necessity of complex feattwes for description of Chinese Iaruguage. 
1 Multiple- Value Label Function 
The phrase structure grammar (PSG) was used extensively in the parsing of natural law 
guage. A PSG can be expressed by a tree graph where every node has a correspondent label. 
The relationship between the node x and its label y can be described by a mono-value label 
function L: 
L(x) = y 
For every value of node x, there is only one corresponding value of label y. 
In 1981, we designed a multilingual automatic translation system FAJRA (from Chinese to 
French, English, Japanese, Russian, German). In 1985; we designed two automatic translation 
systems GCAT (from German to Chinese) and FCAT (from French to Chinese). In system 
FAJRA, we must do automatic analysis of Chinese, in system GCAT and FCAT, we must do 
automatic generation of Chinese language. We found that the linguistic features of Chinese ex... 
pressed by the mono-value label function of PSG is rather limited. In the automatic analysis 
of Chinese language, PSG can not treat properly the ambiguity, the result of analysis often 
bring about a lot of ambiguous structures. In the automatic generation of Chinese language, the 
generative power of PSG is so strong that a lot of ungrammatical sentences are generated. It is 
the chief drawback of PSG. In order to overcome this drawback of PSG, in system FAJRA, 
we proposed a multiple- value label function to replace the mono- value label function, and in 
systems GCAT and FCAT, we further improved this approach. 
A multiple-value label function can be described asbelow: 
L(x) = {y~, Y2,..., Y~} 
whereby a label x of tree can correspond to several labels {y~, ya, ..., y=}. By the means ol; 
this function, the generative power of PSG was restricted, the number of ambiguous structures 
were reduced, and the drawback of PSG was efl\]ciently overcome. 
Beginning with the augmented transition network (ATN) concept and .inspired by J.Bresnan's 
work on lexically oriented non-transformational linguistics, the lexical functional grammar 
(LFG) framework of J. Bresnan and R. Kaplan was evolved. Simultanously, M. Kay devise(\]. 
the functional m~Afication grammar (FUG)., G. G~dar, E. Klein and G. Pullum proposed. 
the generalized phrase structure grammar (GPSG). Implementation of GPSG 
1 
118 
at Hewtett-Packard led C. Pollard and his collcgues to design the head-driven phrase struc- 
ture grm:amar (HPSG) as a successor to GPSG. In all these formalisms of grammars, the 
complex features are widely used to replace the simple feature of PSG and it is an improvement 
of PSG. Therefore, The concept of complex features is very important for present development 
of computational linguistics. 
In the systems I;A,IRA, GCAT and FCAT, the values of labels must be the features of lan- 
guage, so the mtdtiple- value labels must be also the complex features of language. In fact, the 
concept of multiple--value labels and the concept of complex features is very similar. 
Historically, all these concepts are the results of separate researches in computational linguistics 
for improvement of PSG. ThLts we can take our multiple-value labels as complex features. 
The famous linguist De Saussure (1857--1913) had pointed out in his <Course in General 
Linguistics> : " Language in a manner oi" speaking, is a type of algeb,a consisting solely of 
complex terms" (p122, English version, 1959). He takes the flexion of" Nacht : N~/chte" in 
German as the example. The relation "Nacht : Niichte" c:m be expressed by an algebraic formu- 
la a/b in which a and b are not simple terms but result from a set of relations .... complex 
terms. The complex terms of Nacht are: norm, feminine gender, singular number, nominatix'e 
case, its principal vowel is "a". The complex terms of N'~.chte are: noun, feminine gender, 
plural number, nominative case, its principal vowel is "fi", its ending is "e", the pronunciation 
of" ch" changes from/x/to /C.'. De Saussure said: "But language being what it is, we shall 
find nothing simple in it regardless oF our approach; every where and always there is the stone 
conlplex equilibrium of * " ~ , 1959). So ~e~m, that mutually condition each other" (P122,E. v. 
called" complex terms" mentiot~ed here by Dc Saussure is nothing but the "complex features" 
or the "multiple -value labels"in computationa! linguistics. After all, De Saussure is a scholar 
with a foresight, fie has proved himself to be a pioneer of modern linguistics. The concept of 
'~omplex terms" oC De Saussure has served as a source of inspiration for tks to proposed the con- 
cept of "muhiple-value labels". However, the property of Chinese language is more important 
than the concept of De Saussure in the dex.elopmemnt of our "multiple--value labels"( or "com- 
plex features" ), because without the "multiple- value labels" (or "complex features" ) we can 
not adequately describe the Chinese language in the autornatic translation. 
{2 Necessity of Complex Feature for Description of Chinese Language 
If there is a necessity of complex features in description of' English, then this necessity is 
more obvious fbr description of Chinese. 
The reasons are as following: 
1. There is not one--to--one correspondence relation between the phrase types (or parts of 
speech) m~d their syntactic functions in the Chinese sentences. 
Chinese is dilRrent from Endish. In English, the phrase types generally correspond to their 
syntactic functions. For example, we have NP + VP --- > S in English, whereby NP corre- 
sponds to subject, VP corresponds to predicate, and S is a sentence, becoming a construction" 
i19 
subject + predicate". There is a one-to-one correspondence between the phrase types and 
their syntactic ftmctions. In Chinese, the structure "NP + VP" can form a sentence, so we 
have also NP + VP - - > S. E. g. in phrase "d~ 2E / IN ~" (little Wang coughs), "/\]x t" 
(little Wang) is a NP, "~N ~" (to cough) is a VP, forming a construction" suk!ject 
.+predicate". However, in many cases, the NP doesn't correspond to the subject, the VP doesn't 
correspond to the predicate. For example, "~y~; /i~~-t" (progrmming)in Chinese, ".~ j:~" 
(program) is a NP, "iRi-l" (.to design) is a VP, but this NP is a modifier, and this VP is the 
head of the structure "NP + VP". The structure "NP + VP" can not form a sentence, but 
fbrm a new noun phrase NPI: NP + VP - - > NP\[. So this noun phrase becomes a con- 
struction " modifier + head". Similar example are: "i~--N / ~\]~" (langtuage learning), "~ 
~ / rN-~" (physics examination), etc. In these phr~,~ses, the phrase tyws "NP + VP" can not 
form a construction "subject + predicate", but form a construction "modifier -~: head". In 
this case, the "NP + VP" is a syntactically ambiguous structure, the simple features " NP 
+ VP" can not distinguish the differences between the construction "subject + predicate" and 
the construction "modifier + head". We must use the complex features to describe these dif- 
ferences. 
The structure " NP -I- VP" which forms a construction " subject + predicate" can be 
formularized as following complex .feature set: 
CAT = N + CAT --= V 
\[_ \[_ SF = SUBJ_ SF = PRED _t _ 
whereby K is the feature of" the phrase type, NP and VP are the values of this feature; CAT is 
the feature o\[" the parts of speech, N and V are the values of this feature; SF is the feature o~" 
syntactic function, SUB.I and PRED are the values of this feature. With the complex fea- 
tures, the structure " NP + VP" which forms a construction " modifier + head" can be 
formutarized as following: 
CAT = N + CAT = V 
SF = MODF_ SF = HEAD 
whereby MODF and HEAD are the values of the feature SF. 
Obviously, the structure "NP + VP"is ambiguous, there are two syntactically different con- 
structions included in this structure, their phrase types are identical, but their syntactic func- 
tions are different. In order to adequately discribe the differences of them, we have to use the 
complex features in deed. 
In English, we have VP + NP - -> VP1, the VP corresponding to the predicate:, the 
NP to the object, and VP1 is a new verb phrase, it is a construction "predicate + object". In 
Chinese, the structure "VP + NP"can form a new verb phrase, so we have also VP + NP 
- - > VP1. For example, "$,-\]~/~ / I'~'\]~" (to discuss a problem) in Chinese, "$¢i"~" (to dis- 
cttss) is a VP, "l'~-i\]~" (problem)is a NP, forming a new verb phrase "predicate + object". 
120 
However, in many cases, VP doesn't correspond to predicate, NP doesn't correspond to ob 
ject. For example, "/di~lR / GN" (taxicab) in Chinese," ~{ft" (to hire) is a VP, "~" 
(automobile) is a NP, but this VP is a modifier and this NP is the head of the structure "VP 
+ NP". This structure can not form a new verb phrase, it forms a new noun phrase, so we 
have: VP + NP - - > NP1. The syntactic function of this noun phrase is a construction " 
modi\[ier + head". The similar examples are:" ~-~ / -_)7 ~," (approach for reseach), "~ 
/~" (regulation for study), "~P~,Yk/i~" (policy of openning the door). The phrase 
type ,;tructure "VP + NP" can not form a construction "predicate + object", but forms a con- 
struction "modifier + head". In this case, the structure "VP + NP" is syntactically ambigu- 
ous. The simple features "VP + NP" is not enough to distinguish the differences between the 
construction " predicate + object" and the construction "modifier + head". We mLmt use the 
complex t~:atures to describe these differences. 
The structure" VP + NP" which forms a 
formularized as following complex feature set: 
I K = VP 
CAT = V + 
_ SF = PRED_ 
construction " predicate + object" can be 
K = NP -I d 
CAT = N 
SF = OBJE_ 
I 
whereby PRED and OBJE are the values 
The structure " VP + NP" which 
formularized as following complex feature 
I K=VP 
CAT = V 
SF = MODF_ 
of feature SF. 
forms a construction " modifier 
set: 
-I- I 
K=NP -1 
CAT = N | 
! SF = HEAD2 
+ head" can be 
whereby MODF and HEAD are the values of feature SF. 
Obviously we can not only use the simple fcature to describe the structure "VP + NP", 
we have to use the complex features. 
2. For the construction which have the same phrase type structure and the same syntactic func- 
tion structure, its semantic relation may be diflErent. So there is not simple one--to-one corre- 
spondence between the syntactic function and its semantic relation. 
The values of semantic relation feature are the following: agent, patient, instrument, scope, 
aim, :result,... etc. In English, there is no much correspondences between the syntactic func- 
tion of sentence element and its semantic relation. In Chinese, the correspondence is more com- 
plicated and sophisticated than in English. 
In the construction" subject + predicate" with corresponding phrase type structure " NP 
+ VP', the subject may be agent, but it may also be patient, or instrument, etc. For exam- 
ple, in the sentence "iIf~/i;~T" (I have read), the subject ":ffB" (I) is the agent, but in the 
sentence " --p} / ~ T" ( the book has been read), the subject "@" (book) is the patient, and 
the verb "i~T" (to read) doesn't change its form, it always takes the original form. In most 
European languages, if the subject is the agent, then the verb must take an active form, and if 
121 
the subject is patient, then the verb must take the passive form. However, In Chinese, the verb 
always keeps the same form no matter whether the subject is an agent or a patient. In an over- 
whelming majority of cases, the passive form of verb is seldom or never used in Chinese. By 
this reason, in the atttomatic information processing of Chinese language, it is not enough to 
only use the features of phrase types and synt~tic functions, we must use the features of 
semantic relations, thus the features for description of Chinese language will become mor~ com- 
plex. 
In the structure "NP + VP", if the syntactic function of NP is subject, and the semantic 
relation of NP is agent, then the complex features of this structure can be formularized as fol- 
lowing: 
\[ K=NP ~ K=VP \] 
CAT = N + CAT = V \] 
SF = SUBJ l k. SF = PRED J 
L SM AGENT _I 
whereby SM is the feature of semantic relation, AGENT is the values of feature SM. 
In the structure "NP + VP', if the syntactic function of NP is subject and the semantic 
relation of NP is patient, then the complex features of this structure can be formularized as fol- 
lowing: 
K = NP 
t CAT = N 
SF = SUBJ 
_ SM = PATIENT_ 
+ i K = VP 1 CAT = V 
SF. = PRED 
whereby PATIENT is the value o\[ feature SM. 
In the strcture " VP+NP" with corresponding syntactic construction" predicate +object", 
the object may be a patient, but it may also be an agent, or an instrument, or a scope, or 
an aim, or a result,..., etc. In the sentence "~${/~-Y" (to wipe the window) "~" , .~, ( to 
wipe) is the predicate, "~" (window) is the object, and its semantic feature is the patient. 
The complex features of this sentence can be formularized as following: 
\[ K = VP - 
CAT = V 
\[_SF = PRED 
+ CAT = N 
SF = OBJE 
_ SNI = PATIENT _ 
whereby PATIENT is thc value of feature SM. 
But we have also the following sentences where the semantic relation of object is not the 
patient. There are many very interesting phenomena in Chinese language: In the sentence "~ 
71"/~." (the father died), the object "~;~" (father) is the agent of the verb "~T" (to 
die). This structure can be formularized as following comlex feature set: 
122 
cK. = VP -1 
AT=V . 
F = PREDJ 
whereby AGENT is the vaiue O\[ feature SM. 
-\[- ,1 CAT = N 
{St ~" = OBJE 
\[_SM -- AGENT 
\]\[n the sentence "lQi/~)'g" (to cat with a big bowl), the object" ~ @L"' (big bowl) is the 
~/_. n instrument of the verb " z. (to eat) This structure can tie formularizcd as following complex 
feature set: 
CAT = V 4- \[CAT = N 
SF = PRED \]SF = OBJE 
L SM = INST __ 
whereby INST is the value of` l'e.aturc SM. 
In the sentence " ~- / "~ ~'-" >~j..:-- ( to 
(mathematics) is the scope o, e the verb "&" 
as following complex feature set: 
~K =: VP 
I 
\]CAT-- V 
iS1 r = PRED L ¢-- 
examine in mathematics), the object" ~Z-~-~" 
to examine). This strt,.cture can be formularized 
F 
rK = NP 
CAT = N ! 
I 
whereby SCOPt: is fine \,:due o\[" \['cattue SM. 
111 tlie sentence "~K, .... c;l 3; C";> /- \[. ' .... (to exanqme 
object " g:~)~-~.£'" (~raciualtc student)is the aim o{" tlne verb ")~" {to CXet!ll\[liC). 
can bc fornlt'.lp.riTcd as f'oll,xoi'a# complex fcatur'c set: 
in order to become a gracv~mtc stuacnt), the 
This structure 
i !K .... vP \] I K ---: >'P -; -I 
I IS\]:_ =PR~D! {sv--oBo~:~i , i J } 
\[ L_sv = a~c, ' 
t J 
whereby AIM is the value o{" feature SM, 
n ~v_ / ,.;,~: Z'~.n In the sentence ~J/ f;',ixo (to pass an cxanfinatiot~ and gct an cxccllcnt marks). "The ob- 
ject " ,':~:z ......... :;'at;J" (excellent ,marks) is the result of the verb @ (to examine). -finis structure can 
be formularized as fbllowing complex {;cature set: 
~-~-K = VP -- 
CAT == V 
SF = PRED~ 
TIr , whereby RESI.j _. I is ttne value of' feature SM. 
.'K "- NP -! _t CAT = N i 
SF = OBJE 
SM = RESULT_I 
Thus we can sec very clcarly that only with the complex features the differences of these 
sentences can be revealed sufficiently . 
3. The grammatical and semantic fcatures of words play an important role to put f`o> 
ward the rules of` parsing. These features can bc used as the couditions in the rules , and 
123 
they must be included into the system of complex features. Thus the complex features is the 
basis to set up the rules for parsing of Chinese language. I 
In the structure VP + NP, if the grammatical featme of VP is intransitive verb, then the 
syntactic function of VP must be modifier, and the syntactic function of NP must be head. The 
syntactic function of this structure VP + NP will be "modifier + head". Thus we can have a 
rule which is described with complex features as following: 
K = VP x,! K = 
TRANS = I 
-K = VP -~ 
---> I CAT=V / 
TRANS = IV I 
t !_sF = MODFJ 
÷ I 
K = NP \] 
iL%d 
whereby TRANS represents the feature of transitb~ty of verb, IV represents intransitive verb, 
it is a value of feature TRANS. 
This rule means: In the structure VP + NP, if the value of` TRANS d" VP is IV, then the 
syntactical function of VP can be given the value MODF, and the syntactical function of NP 
can be given the value HEAD .... 
The semantic features of words can also be used to put forward the rules of' parsing. In the 
structure VP + NP, if VP is transitive verb, then we have to tLSe the semantic Features or NP 
to decide the valtm of. syntactic Function of. this structure. Generally speaking, if VP is transitive 
verb, the semantic feature of. NP is" abstract thing", or" title of a technical or processional 
post"', then the syntactical function of. VP is modifier, and the syntactical function o2 NP is 
head. For example, in the phrase " fill }5~/ ~1 -~~" ~ purt~ose of training) tlne VP ijl\[ ,~,: (to 
train) is a transitive verb, a~d the semantic Feature of' NP "i-! ~)"(purpose)is "abstract thing", 
we can decide that the syntactical f'unction or"-i.il\[ ~,~i" (to train) is m~iificr, and the 
syntactical function of "hi ~{J" (purpose)is head. In the phrase "~ f~';5/ {~.%" (high trainirg 
teacher), the VP "~I'~}" (to engage in advanced studies) is a transitive verb, the NP ":g£0~" 
(teacher) is a title of' professional post, thus we can decide that the syntactical function of' 
VP " :'~i~7"." (to engage in advanced studies)is m.odi~ier, the sy~tacticaI function of. NP "~Jili" 
'(teacher) is head. 
Therefore we can have two folhvoing rules for parsing: (1)FK-- 
vP ~ \[K: >-,, -\] 
ICAT- V I+/c 'm = ,", / .... > 
LTRANS =TV LSEM = ABSJ 
+ A --N /-->, /C T= V ! 
LSEM = Pk \] I RANS = I 
_ jsF = MODF/ 
f-K = VP -\] t 7K = NP >3 
CAT -- V / +' \]CAT = N 
TRANS = TV | |SEM = ABSF t 
_ \[_SF = MODF ~ SF = HEAII)__ 
ff = NP -1\] 
J LSF = I--IEADJ 
(2) I K = VP l 
CAT = V 
TRANS= T 
whereby TV represents transitive verb, it is a value of TRANS, ABS represents abstract thing 
it is a value of SEN{, PP, F represents title of a technical or professional post, it is another value 
124 
of SEM. On the basis of these complex features, two new values of feature SF can be deduced: 
SF = MODY and SF = HEAD. 
With the complex features, we got good results in machine translation systems FAJRA, 
(3CAT and FCAT (see Annex). 
125 
Annex 
Chinese input 
~'~A 4 Novembre igSI 
Fig. : A result of multilingual machine translation 
126 
......... En'ghsh output 
~;l'~l~'/~i 4 Novembr~ 1981 
Language trod comrmmicatton play important role in human 
lllo. Scholar'~ are actively studying automatic tranalat',c, tt o~ long- 
tinge.They have proposed the theory o~ automat on and tile theory 
of formal grammar. An automaton is an abstract machim~ that can 
recogaize language. Grammar rule can transform a tree graph to 
another tree graph automati~ly. Thin system for automatic 
tra~lation had been used by them for five years. If we compare 
this system with other systems, then we can see its advantage. 
French output 
Y~I'~ 4 Novembre 1981 
Lc langage at la o~mmumcation jouoJat tln r~le irnt~rta~t 
la vie humame.L~ eavtml~ s~tlt el:). train d'etUdier actival~.nt la 
traductiort autorcmtique des langages. ILa ant ~pos6 la th6orle dos 
automates et la theorie des grammaires formelles, un automata 
eat une machine atmtrdite qui peu~ r~conna Itro le langage. La 
regIe do la ~tmmaire pout transformer automattquettteat una 
arborescaace an tma autre arborescance. Ce s~t~aa de ia maduction 
automatlque a 6t6 uti.li.~ par enx pendant chae t ans, Sl nous compa- 
rons ce ~tyst6me ~ d'autres ~,nt~m~, alors nous pouvcals ,,.oh" een 
avaI~tages. 
Japanese ou~pu~ 
R~ 4 Novembre 1881 
i i i i i i i u 111111 ii i i rlllll -- 
German ou~pu~ 
~"a~/~ 4 Novembre 19BI 
Die Sprache mad die Komml/~tikatio/~ spielen hal \]Leben der 
Menachen Wich~ige Rolle. Dis Wi.~az~chaftler sind gerade dar..~, 
die autornatLgche tJ~3or~etzung der SFcachen aktiv zu stud.larCh. Sio 
haben dic Theorte der Automateax und die Thaorie der form~ext 
Grammatikan Vorg~chlagen. Em Automat ist clue abstrekta Man- 
claW.e, die die Sprache erksmaen Kann. Die Grammatik~'egel Kaxm 
ethan 13atmagraphen tn elnan anderen Baamgraphen automatinch 
umwandeln. Dieses System z'ur automatiachen \[3"~,-~tzUttg war van 
lhnan seit f~nf Jahren verwendet warden. Wenn wit ~esee Syntem 
mit anderen Systcmen verglmchelz., dann k~Snnctl wit ne.txten Vo~ 
sohel~ 
Russian output 
~\[~'l'~I~ 4 Novembre 1981 
~3btx 11 Kor, tMyIl~gaRItst llrpaltyr Ba)Kgy~o DOab e 7~I~115/1 .leJlo~'-- 
~lec-rna.yqd~e aK'rrtm~o nclyaa~c a~ro~ani~ecma~ ~epea~ ~l~t~on.Otm 
SM,7.dKraloT TeopK10 aBTOMaTa It TeOp.~x) d~opMaJIbffOgl gl~t~L'4aTal~t. 
ABTOMaT C<:'fh O,nJ~a a0cTImKTllaa ~am~aa,Koropan ~to>KeT pacilo3HaBaTb 
~.3blg.'fIpa~lRO rpa.~MaruKa Moa~er Rf~FO~aTIIqeCKR npel~patttar~ O:tlly 
ape~Brl.~t~ylO cxe,'ay ua apvry~ ,zpe~s~;~uym cxe:~y.~Ta cacreHa aa- 
~otaa£at{eclcoro ttepel3o~a acllo,2h.3oBarla flYd~ yace O~ her. Ec.an Mla 
cpa~ttme~t zwy cac're.,~y c /pyfmqa cacxeMa.~ta,ro ma Mo~ce~t ~Aer~ 
e~ apen~,~rmec.nm. 

BIBLIOGRAPHY 

Feng Zhiwei, multiple-branched and multiple-labeled tree analysis of Chinese sen- tence, <Journal of Artificial Intelligence>, No 2, 1983. 

De Saussure, Course in General Linguistics, English version, 1959. 

M. Kay, Parsing in functional unification grammar, in <Natural Language Parsing' Psychological, Computational and Theoretical Pers~ctives> , 1985. 
