PREFERRED ARGUMENT STRUCTURE FOR DISCOURSE UNDERSTANDING 
KA-WAI CHUI 
Matsushita Electric Institute of Technology 
(Taipei) Co., Ltd. 
Abstract 
The main purpose of communication is to 
exchange information. Any discourse 
understanding model should be able to process 
the flow of information throughout the entire 
text. According to Du Bois (1987)'s studies of 
information flow in discourse across a number 
of languages, information distribution among 
argument positions in clauses is by no means 
random, but cemdn grammatical patterns tend to 
recur consistently. He thus formulated a 
Preferred Argument Structure (PAS) for the 
preferential structural configurations of 
arguments. In our examination of Chinese 
narrative discourse, the language also displays 
PAS, yet the Chinese PAS challenges tim 
universality of the one Du Bois proposed. 
Based on the quantity and distribution of lexical 
arguments and new referents across 
grammatical roles in discourse, it is realized that 
Chinese PAS also maintains one new argument 
at most within a basic information processiug 
unit. Since new referents in Chinese have to be 
encoded in full NP form, it is thus less likely to 
have more than one lexical argument within a 
clause. Moreover, this single new argument 
appears preferentially in the O role, rather than 
the A and S roles Du Bois's PAS formulates. 
Since the structure of information flow has a 
corresponding grammatical patterning, both 
grammatical and pragmatic processing can be 
carried out simultaneously, in that the 
information status of an argument can be 
identified by virtue of grammatical analysis. 
Althougll PAS is neither universal nor 
categorical, it can function in a discourse 
understanding model as heuristic device to 
process the information structure of a connected 
spoken discourse. 
According to Du Bois (1987)'s studies of 
information flow in discourse across a number 
of languages, information distribution among 
argument positions is neither arbitrary nor 
random, but certain grammatical patterns are 
preferred over others, especially they tend to 
recur consistently in a connected spoken 
discourse, hi other words, the structure of 
information flow has a corresponding 
grammatical patterning. Those recurrent 
patterns, which indeed reflect speakers' actual 
language use, are formulated as Preferred 
Argument Structure (PAS) The PAS he 
formulated comprises the following constraints: 
One Lexical Argument Constraint to avoid more 
than one referent in full NP form per clause, 
Non-Lexical A Constraint to have the single 
lexical referent not appearing in the A role, One 
New Argument Constraint to avoid more than 
one referent carrying new information per 
clause, Given A Constraint to have the new 
referent not appeariug in the A role. However, 
in examining Chinese narrative discourse, it is 
discovered that the PAS that this particular type 
of discourse genre displays challenges the 
universality of Du Bois's. The idiosyncrasy of 
the Ctlinese PAS will be discussed in this 
paper. 
in fact, from the computational point of 
view, no matter it is universal or language- 
specific, the existence of PAS has significant 
implication to discourse understanding. On the 
one band, it enables grammatical and pragmatic 
processing being carried out simultaneously 
because the information status of a referent can 
be identified by virtue of grammatical analysis; 
on the other hand, PAS can function as heuristic 
device to process the information structure of a 
connected discourse. 
1. Introduction 
The main purpose of communication is to 
exchange intbrmation. On the part of a speaker, 
he may employ various strategies to organize 
the information he intends to couvey, in that 
some bear old information while others carry 
new information. Therefore, a discourse 
understanding model should be able to process 
the flow of information throughout the entire 
text. In this paper, our issue is tbcused on the 
referring arguments in Chinese narrative 
discourse, and our main concern is how they 
are structured in relation to infom~ation flow. 
2. Preferred Argument Structure in 
Chinese Narrative Discourse 
Unlike the languages Du Bois has studied 
(1987), Mandarin Chinese is a typologically 
different language with no inflection and 
relatively free word order. Nevertheless, it still 
exlfibits its own idiosyncratic PAS in spoken 
narrative discourse. The corpus for the present 
study comprises eight oral narratives as told by 
eight Mandarin native speakers of 20-25 years 
old. They were requested to describe the story 
about thc popular movie Ghost to the 
interviewer in a speech laboratory. It portrayed 
AcrEs DI3 COLING 92, NAV,'II!S, 23-28 aOl':r 1992 1 1 4 2 PROC. OF COL1NG-92, NhNrF.s, AUG. 23-28, 1992 
a young man who was killed accidentally in a 
robbery, and who tried to protect his girlfiiend 
from file nmrderers and to take revenge on them 
in form of a spirit. The uarratives were taped 
for later transcription. 
To study the Chinese PAS, our examination 
is focused on the issues of quantity and role iu 
distributing lexical arguments and new referents 
across grammatical positions at both the 
grammatical and pragmatic dimensions. 
2.1 Preliminaries for Analysis 
Segmentation of the 120 minutes long narratives 
was subject to intonation unit being identified 
by eat' as at stretch of speech uttered under a 
single coherent intonation contour and typically 
bounded by a pause. Chafe (1987) hlts 
hypothesized that intonation units representing 
linguistic expressions of focuses of 
consciousness are independent processing units 
typical of spoken discourse. In the present 
corpus, there were a total of 1433 intonation 
units, with a mean length of 6.69 words. The 
fact that the clause being defined as a verb and 
its arguments, and the intonation unit often 
coincide (Du Bois, 1987; Chafe, 1987, 1988) 
was further confirmed in this study, sittce 
85.28% (1222) intonation units contained 
clausal elements. Those units comt~rised false 
starts, repetitions, filled pauses, as well as 
clause fragments such as conjunctions, 
adverbials, and particles would be excluded 
from further analysis. Therefore, tile study of 
Chinese PAS is indeed based on chmses. 
Following is a sample of five clausal intonation 
units (a-e) produced by a female speaker: 
(1) a. jiouyitian ta genzhe ta nupengyou 
one day he follow his girlfriend 
'One day, he followed his girlfriend.' 
b. zai ta nupengyou ji~di deshihou 
in bis girlfriend home when 
'When (he was) in his gMfriend's tlome,' 
c. la nupengyou zcd huan yifu 
his girlfriend PROG change clothes 
'his girlfriend was changing clothes.' 
d. ranhou you yi ge huztiren 
then there-be one CI~ bad guy 
'Then, there was a bad guy.' 
e. huairen chuang finial 
bad guy break in 
'The bad guy broke in.' 
Within a single clause, the molphological 
type of each referent, its gramnlatical role, as 
well as information status were all recorded. 
The morphological type of an overt referent in 
Chinese was either a lexical NP or a pronoun, 
whose surface grammatical role would be 
classified as A (transitive subject), S 
(intransitive subject), O (transitive object), or 
Oblique (object of a preposition). Furthermore, 
Chafe (1987)'s three-way distinction of 
information tbr referents was adopted, mainly 
because his categories lay their foundation on 
the actual cognitive processing of information 
transfer by language users. They were given 
information, accessible information, and new 
information. A given referent referred to the 
entity mentioned previously, while a new 
refi~rent was the one that had uot yet been 
brought up in the prior context. Internlediate 
between these two was accessible information, 
either coming from the expectations associated 
with a schema or resulting ti~om deactivation 
from an earlier state. Following Du Bois 
(1987), a referent constituted by deactivation 
should be at least twenty propositions away 
from its most recent appeluance operationally. 
2.2 The Grammatical Dimension of PAS 
The purpose of studying PAS at the 
grammatical dimension is to examine whether 
there is a prefen'ed surface configuration of 
arguments in the observed data. Therefore, we 
investigate both the number of lexical (NP) 
argmnents and their distribution across the 
granunatical roles in clauses. 
According to our tabulation shown in Table 
I, of the 1127 clauses (excluding the equational 
type), those with zero or one lexical argument 
are the most common structure which constitute 
a distinct majority (94.15%). 
Table 1. Frequency of clauses with 0, 1, 
and 2 lexical argmnents 
frequency percentage 
0 lex arg 587 52.09 
1 lex arg 474 42.06 
2 lex arg 66 5.85 
Total 1127 100 
( X2.99 (2) = 399.89 ) 
Since only transitive verbs can take more than 
one argument, it is necessary to seperate them 
from the intransitive ones for tabulation, in ca~ 
the rarity of two-lexical-argument structures is 
simply due to tile rmity of transitive clauses. 
The result in "Fable 2 shows clearly that even in 
transitive constructions, two-lexical-argument 
structures are still a minority (9.17%). \]'he 
result indeed supports Du Bois's One Lexical 
Arguanent Cot~'traint in that "there is a tendency 
for speakers to avoid more than one lexical 
argument per clause" ( p.819). 
¢ Ac.q'ES 19E COLlNG-92, NAN.S, 23-28 AO(TI 1992 1 1 4 3 I'I~OC. o~ COl.IN(I-92, NAN 1 l':S, AtIG, 23-28, 1992 
Table 2. The frequency of lexical arguments 
in transitive and intransitive clauses 
Transitive 
f~a/% 
0 lex arg 321 44.65 
1 lex arg 332 46.18 
21exarg 66 9.17 ~-- 
Total 719 100 
(x 
Intransitive To~l 
freq % freq % 
266 65.2 587' 52.09 
142 34.8 47442.06 
66 5.85 
408 100 1127 1~1 
99(2) = 66.56 
Since speakers incline to use one lexical 
argument at most in a single clause, it is 
necessary to study whether this single lexical 
referent is randomly distributed across the 
grammatical roles. According to our tabulation 
as shown in Table 3, it is realized that O 
(84.3%) and Oblique (92.31%) each contain an 
overwhelming proportion of lexical arguments, 
whereas A and S contain a smaller portion of 
them. 
Table 3. Grammatical roles and morpholo- 
gical types of arguments 
le~c~ 
n '% 
A 155 38.08 
S 132 55.46 
O 306 84.3. 
OBL 192 92.31 
Total 785 64.56 
pronominal Total 
n % n 
252 61.92 407 
"106 44.54 238 
57 15.7 363 
16 7.69 208 
431 35.44 1216 
( X299 (3) = 265.09) 
Since 64.56% of all referents are lexical, if they 
are randomly distributed across the grammatical 
positions, 38.98% of them will appear in the O 
role, while the A and S roles are restricted to 
include lexical referents, as indicated by Table 
4. 
Table 4. Distribution of lexical arguments 
across grammatical roles 
frtxluency perceutage 
A 
S 
0 
Obi 
Totai 
,,m 
155 "'i9.75 
132 16.82 
306 38.98 
192 24.45 
785 I(X} 
( X299. (3) = 91.17) 
Unlike Du Bois's Non-Lexical A Constraint 
to avoid lexical referents appearing in the A 
position, Chinese speakers would not prefer the 
A and S roles to mention a referent lexically. It 
is the position O (or Oblique) that preferentially 
favors lexical arguments. The Lexical 0 
Constraint is thus proposed to characterize this 
particular phenomenon in Chinese narrative 
discourse. In short, the One Lexical Argument 
Constraint and the Lexical 0 Constraint, which 
are indeed the constraints on quantity and role 
respectively, constitute the Chinese PAS at the 
grammatical dimension. The quantity of lexical 
argument within a clause is usually one at most, 
and this single argument preferentially appears 
in the O role. Although they are not categorical 
rules, they do represent a statistically significant 
tendency of actual language use. 
2.3 The Pragmatic Dimension of PAS 
In the preceding section, it has been shown that 
in narrative discourse different argument 
positions of a clause have distinct 
morphological preferences. This section aims at 
studying the pragmatic dimension of PAS by 
examining the quantity of new arguments, as 
well as their distribution across the grammatical 
roles. 
Firstly, it is found that transitive and 
intransitive clauses either contain zero or one 
new referent, with the former predominating 
(81.06%), as indicated by Table 5. 
Significantly, not a single clause contains two 
new referents. The result supports Du Bois's 
One New Argument Constraint to "avoid more 
than one new argument per clause" (p.826). 
Table 5. The frequency of new arguments 
in transitive and intransitive clauses 
Transitive lntl-'ansitive Total 
freq % freq % freq % 
3 new arg 442 77.27 217 90.04 659 81.06 
new arg 13 22.73 24 9.96 154 i8.94 
Total 572 100 241 100 813 100 
() 199(1) = 18.01) 
To understand whether the single new referent 
is randomly distributed across A, S, O, and 
Oblique, it is necessary to examine the 
distribution of information across these 
positions. As indicated in Table 6, a substantial 
proportion of A and S carry old information, 
and new referents preferentially occur in O and 
Oblique. 
ACRES DE CO\[3NG-92, Nar~q'Es, 23-28 AO(JT 1992 1 1 4 4 Paoc. ov COLING-92, NANTES, AUG. 23-28. 1992 
Table 6. Grammatical roles and information 
status of argutnent 
new accessible gwen fotal 
n % n % n % n 
A 12 2.95 10 2.46 385 94.59 407 
S 24 10.08 11 4.62 203 85.3 238 
O 122 33.61 34 9.37 207 57.02 363 
0BLI 81 38.94 25 12.02 102 49.04 208 
Total I 239' 19.65 80 6.58 897 73.77 1216 
( X299 (6) = 229.02) 
Of 239 new referents found in the corpus, a 
large portion occur in the O role (51.05%) as 
shown in Table 7, while only a small portion 
appear in the A and S roles which 
overwhelmingly convey old information. Since 
Chinese speakers disfaw)r both the A and S 
roles to mention a new referent, Du Bois's 
Given A Constraint, which "avoids introducing 
a new referent in the A-role argument position" 
(p.827), is inappropriate to Chinese narrative 
discourse. The New 0 Constraint is then 
proposed for Chinese to characterize the free 
occurrence of new referents in the O role, as 
well as the high restriction in the A and S roles. 
Tahte 7. Distribution of new arguments 
across gramnmtical roles 
A 
O 
I Obl 
Tot',d 
frequency percentage 
12 5.02 
24 10.(14 
122 51.05 
81 33.89 
239 ~" I(X) 
( X2.99 (3) = 131.96) 
Comparing the frequency distribution of A and 
S, it is even rare for A to code new referents. 
This can be explained by the fact that Chinese 
includes a type of presentative construction 
which "performs the function of introducing 
into a discourse a norm phrase naming an 
entity" (Li & Thompson, 1981). Verbs of this 
sentence type are usually intransitive, and the 
following arguments usually carry new 
information. Since speakers do not necessarily 
use presentative constructions to introduce a 
new entity, they merely constitute a minority 
(20 clauses) in our corpus, as exemplified in (2) 
and (3). 
(2) turan you yi ge huairen paocludaiqiangjie 
suddently exist one CL bad guy run out mb 
'Suddently, there is a bad guy mnning out to rob.' 
(3) fie shang you yi ge zhaopai ya 
street on exist one CL signboard PART 
'On the street, there is a signboard.' 
In short, the One New Argument Constraint 
and the New 0 Constraint constitute the 
Chinese PAS at the pragmatic dimension. 
There is a strong tendency in discourse to limit 
the number of new ~ugument in a clause to a 
maximum of one. This single new referent 
tends to be introduced in the O (or Oblique) role 
and the second occurrences preferentially appear 
in the A and S roles. It is of course this 
preponderance of old information found in the 
{A, S} alignment that gives Chinese the 
distinction of being a topic-prominent language. 
2.4 Correlation of PAS between Gram- 
matical and Pragmatic Dimensions 
We have already studied the quantity and the 
role constraints that constitute PAS for Chinese 
narrative discourse at both the grammatical and 
pragmatic dimensions. The correlation of PAS 
between these two dimensions is so strong that 
the grammatical One Lexical Argument 
Constraint and Lexical 0 Constraint are parallel 
to the pragmatic One New Argument Constraint 
and New 0 Cor~traint respectively, as shown 
in Table 8. In other words, the most preferred 
structure is to have one new argument at most 
within a single clause. Since new referents in 
Chinese have to be coded in full NP form, it is 
thus less likely to include more than one lexical 
argument within one discourse unit. Moreover, 
there is a strong tendency for the single new 
argument to appear in the O role, so that the 
lexical referent typically appear in this particular 
position. The flow of information does have a 
corresponding grammatical patterning. 
Table 8. Dimensions and constndnts of 
Chinese PAS 
Grammar Pragmafics 
Quantity 9he Lexical Argu- One New Argu- 
~nent CorL~'traint ment Constraint 
Lexical 0 New 0 
Role Cort~traint Constraint 
Comparing the Chinese PAS with the one 
Du Bois proposed for the languages he has 
studied such as Sacapultec Maya, as shown in 
Table 9, it is obvious that Du Bois's PAS 
cannot completely be generalized to Chinese, at 
least the narrative discourse genre is concerned. 
Their difference lies in the distribution of lexical 
ACRES OE COLING-92, NANII~S, 23-28 AOI\]T 1992 1145 PROC. OF COLING-92, NANTES, Ant;. 23-28, 1992 
arguments and new referents across 
grammatical roles. As the PAS in Sacapultec 
avoids mentioning new lexical arguments in the 
A role, Chinese speakers disfavor both the A 
and S roles and strongly prefer O. 
Table 9. Dimensions and constraints of PAS 
in Chinese and Sacapultec 
Grammar 
Chinese Sacapultec 
Quantity One Lexical~4rgume'ntConstraint 
Role Lexical 0 Non-Lexical A 
Constraint Constraint 
ii Pragmatics 
Chinese Sacapultec 
"Quantity One New'A'rgurnent Constraint 
New 0 Given A 
Role Constraint Constraint 
3. Implication of PAS to Discourse 
Understanding 
Chinese, like a number of other languages 
whose pattern of information flow in spoken 
narrative discourse has been investigated to 
date, also exhibits PAS. This suggests that 
there is a strong discourse pressure driving the 
various grammatical patterning in different 
languages, so that the universality of the PAS 
Du Bois proposed encounters challenge. 
However, from the computational point of 
view, no matter whether PAS is universal or 
language-specific, its existence has significant 
implication to discourse understanding, in that 
the flow of information throughout a connected 
discourse is highly structured with a 
corresponding grammatical patterning as far as 
quantity and role are concerned. Therefore, it is 
possible to identify the information status of an 
argument by virtue of grammatical analysis, so 
that both grammatical and pragmatic processing 
can be carried out simultaneously. Even though 
PAS is not categorical in nature, a discourse 
understanding model can still use it as heuristic 
device to process the information structure of a 
connected spoken discourse. 
In short, a discourse understanding model 
employing PAS for information processing 
should take the following points into 
consideration: 
(a) Clauses are the basic information processing 
units. 
(b) Transitive and intransitive clauses should be 
seperated for analysis. 
(c) The morphological type, grammatical role, 
and information status should he recorded 
for each argument position. 
(d) The quantity and role constraints are the 
heuristic principles for information 
processing. 
4. Conclusion 
hi this paper, we have demonstrated that 
Chinese narrative discourse also displays 
Preferred Argument Structure based on the 
quantity and distribution of lexieal arguments 
and new referents across grammatical roles. 
However, the Chinese PAS challenges the 
universality of the one Du Bois proposed, 
because they differ in the distribution of lexical 
arguments and new referents across 
grammatical roles. In other words, the 
discourse pressure driving the various 
grammatical patterning in different languages 
reflects the underlying pragmatic preference of 
the different groups of language users. 
From the computational viewpoint, no 
matter whether PAS is universal or language- 
specific, its existence has significant implication 
to discourse understanding. On the one hand, 
PAS can function in a discourse understanding 
model as a heuristic device to process the 
information structure of a connected spoken 
discourse; on the other hand, the information 
status of an argument can be identified by virtue 
of grammatical analysis since the flow of 
information has a corresponding grammatical 
patterning, 
References 
Chafe, Wallace L. 1987. Cognitive Constraints 
on Information Flow. In Russells S. Tomlin 
(ed.), Coherence and Grounding in 
Discourse. Amsterdam: Benjamins. 
Chafe, Wallace L. 1988. Linking Intonation 
Units in Spoken English. In Haiman & 
Thompson (eds.), Clause Combining in Dis- 
course and Grammar.Amsterdam: Benjamins. 
Du Bois, John W. 1987. The Discourse Basis 
of Ergativity. Language 63, 805~855. 
Givon, Tahny. 1983. (ed.). Topic Continuity in 
Discourse: a Quantitative Cross-language 
Study. Armsterdam: Benjamins. 
Li, Charles, & Thompson, Sandra. 1981. 
Mandaring Chinese: A Functional Reference 
Grammar. California: University of 
California Press. 
Givon, Talmy. 1983. (ed.). Topic Continuity 
in Discourse: A Quantitative Cross-Language 
Stduy. Amsterdam: Benjamins. 
AUI'ES DE COLING-92, NANTES, 23-28 aOl~q 1992 l 14 6 PROC. OI: COLING-92, NANTES. AUG. 23-28, 1992 
