A Constraint-Based Approach to Linguistic Performance* 
HASIDA, KSiti 
Tokyo University 
7-3-1, Hongo, Bunkyo-ku, Tokyo 113, Japan 
Electrotechnical Laboratory 
1-1-4, Umezono, Tukuba, Ibaraki 103, Japan 
Institute for New Generation 
Computer Technology (ICOT) 
Mita-Kokusai Bldg. 21F, 1-4-28, 
Mira, Minato-ku, Tokyo 108, Japan 
-I-81-3-456-3194, hasida@icot.jp 
Abstract 
This paper investigates linguistic performance, from 
the viewpoint that the information processing in cogni- 
tive systems should be designed in terms of constraints 
rather than procedures in order to deal with partiality 
of information. In this perspective, the same gram- 
mar, the same belief and the same processing architec- 
ture should underly both sentence comprehension and 
production. A basic model of sentence processing, for 
both comprehension and production, is derived along 
this llne of reasoning. This model is demonstrated to 
account for diverse linguistic phenomena apparently 
unrelated to each other, lending empirica! support to 
the constraint paradigm. 
1 Introduction 
All the cognitive agents, with limited capacity for infor- 
mation processing, face partiality of information: In- 
formation relevant to their activities is only partially 
accessible, and also the distribution pattern of the ac- 
cessible information is too diverse to predict. In sen- 
tence comprehension, for example, the phonological or 
morphological information may or may not be partially 
missing due to some noise, the semantic information 
may or may not be abundant because of familiality or 
ignorance on the topics, and so forth. Thus the infor- 
mation distribution is very diverse and rather orthogo- 
nal to the underlying information structure consisting 
of the modules of morphological, syntactic, pragmatic, 
and other constraints. 
This diversity of information distribution gives rise 
to a very complex, non-modular flow of information in 
cognitive processes, as information flows from places 
possessing information to places lacking information. 
In order to deal with this complexity, a cognitive sys- 
tem must be designed to include two different logical 
layers: 
'(1) Information represented in terms of constraints, 
*The work reported here started as the author's doctoral re- 
search at Tokyo University, and has developed filrther at Elec- 
trotechnical Laboratory and ICOT. The author's current affil- 
iation is ICOT. ltis thanks go to Prof. YAMADA Itisao, who 
was the supervisor of the doctoral program, and too many other 
people to enu,nerate here. 
by abstracting away information flow. 
(2) A general processing mechanism to convey infor- 
mation across constraints, from places possessing 
information to places lacking it. 
>'on-modular flow of information may be captured on 
the basis of modular design of cognitive architecture, 
only by separating the representation of underlying in- 
formation (as (1)) and flow of information (as (2)) fl'om 
each other. 
Procedural approaches break down under partial- 
ity of information, because procedures stipulate, and 
hence restrict, information flow. If one. be it human 
or nature, were to implement such diverse information 
flow by procedural programming, the entire system 
would quickly become too complex to keep track of, 
failing to maintain the modularity of the system. This 
is what has always happened, for example, in the de- 
velopment of natural language processing systems. 
The rest of the paper exemplifies the efficacy of the 
constraint paradig,n with regard to natural language. 
We wil! first discuss a general picture of language fac- 
ulty immediately obtained fl'om the constraint-based 
view, and then derive a model of sentence processing 
neutral between comprehension and production. This 
model will be shown to fit several linguistic phenom- 
ena. Due to the generality of the perspective, the phe- 
nomena discussed below encompass apparently unre- 
lated aspects of natural language. 
2 Language and Constraint 
From tile constraint-based perspective immediately 
follows a hypothesis tI'lat the same constraints (i.e., 
lexical, syntactic, semantic, pragmatic, and whatever), 
corresponding to (1), and the same processing architec- 
ture, corresponding to (2), should underly both sen- 
tence comprehension and production. Other authors 
have expressed less radical stances. For instance, Kay 
\[11\] adopts two different grammars for parsing and gen- 
eration. Our hypothesis is also stronger than Shieber's 
\[16\]; Although he proposes to share not only one gram- 
mar but also one processing architecture between the 
two tasks, this 'common' architecture is, unlike ours, 
parameterized so as to adapt itself to parsing and 
generation in accordance with different parameter set- 
tings. 
1 
149 
As a corollary of our strong uniformity hypothesis, 
we reject every approach postulating any procedure 
specific to sentence comprehension or production. For 
instance, we disagree upon the ways in which the De- 
terminism Hypothesis (DH) \[12\] has been instantiated 
so far. DH permits to assume only one partial struc- 
ture of a sentence at a time, and the approaches along 
this line \[2, 3, 12, 14\] has postulated, beyond necessity, 
specific ways of disambiguation for specific types of 
ambiguity in sentence comprehension and production. 
Instead we view sentence processing as parallel com- 
putation. When a sentence is either comprehended or 
produced, several partial structures of it, we assume, 
are simultaneously hypothesized. The degree of par- 
allelism should be limited to fall within the small ca- 
pacity of the short-term memory (STM), so that we 
obtain the same sort of predictions as we do along 
the determinist account. For instance, the difficulty 
in comprehending garden path sentences like (3) may 
be attributed to the difficulty of keeping some struc- 
tural hypotheses in STM. 
(3) The chocolate cakes are coated with tastes sweet. 
As discussed below, our approach quantitatively esti- 
mates the difficulty in processing embedded construc- 
tions like (4) also on the basis of the memory limita- 
tion. 
(4) The cheese the rat the cat the dog chased caught 
bit was rotten. 
Since DH does not account for such difficulty, inciden- 
tally, it seems superfluous to postulate DH. We con- 
sider DH jnst~ as approximation of severe memory lim- 
itation, and avoid any stipulation of such a hypothesis. 
3 A Common Process Model 
Among the partial structures hypothesized during 
comprehension or production of a sentence, we pay at- 
tention to the maximal st~'uctures; the structures such 
that there is no larger structures. Here we say one 
structure is larger than another when the former in- 
eludes the latter. For example, \[s \[NP Tom\] \[vp sleeps\]\] 
is larger than \[s \[NP Tom\] VP\]. Sentence processing, 
whether comprehension or production, is regarded as 
parallel construction of several maximM structures. 
Thus sentence processing as & whole is characterized 
by specifying what a maximal structure is. 
We assume the grammatical structure of a sen- 
tence to be a binary tree. Here we identify a word 
with its grammatical category, so that a local struc- 
ture, such as \[NP Tom\], is regarded as one node rather 
than a partial tree consisting of two distinct nodes. 
It is just for expository simplification that we as- 
sume binary trees. Our account can be generalized 
straightforwardly to allow n-ary trees. Further, the 
essence of our discussion below is neutral between the 
constituency-based approaches and the dependency- 
based approaches. Here we employ a representation 
scheme of the former type, without committing our- 
selves to the constituency-based framework. 
From the general speculation below, it follows that a 
maximal structure should be the left-hand half of (5). 
(5) s 
This maximal structure consists of the path form S to 
A and the part to the left of this path, except for Bi-1 
and the nodes between Bi-1 and Ai (those on tile slant 
dotted lines) for 1 < i < d+l;Aiandthenodesbe- 
tween Ai and Bi are included in the maximal structure. 
Here B0 and Ad+l stand for S and A, respectively. Ai is 
a leftmost descendant (not necessarily the left daugh- 
ter) of Bi_l or they are identical for 1 _< i < d+l. 
Bi is a rightmost descendant (not necessarily the right 
d&v.ghter) of Ai for 1 G i < d. Thus our model is 
similar to left-corner parser \[1\], though our discussion 
is not restricted to parsing. 
This characterization of a maximal structure is ob- 
tained as follows. First note that a maximal structure 
involves n words and n- i nonterminal nodes, for some 
natural number n; In the maximal structure in (5), the 
connected substructure containing Ai (l <; i _< d) 
contains as many nonterminal nodes as words, so that 
the maximal structure also contains as many nonter- 
minal nodes as words, except for word A. Note further 
that the entire sentence structure, being a binary tree, 
also involves one less nonterminal nodes than words. 
Accordingly, postulating n - 1 nonterminM nodes ver- 
sus n words in a maximal structure amounts to postu- 
lating that the words and the nonterminal nodes are 
processed at approximately constant speed relative to 
each other. 1 The number of words is a measure of lexi- 
cal information, and the number of nonterminal nodes 
is a measure of syntactic and semantic information, 
among others. Hence if all the types of linguistic in- 
formation (lexical, syntactic, semantic, etc.) are pro- 
cessed at approximately the same relative speed, then a 
maximal process should include nearly as many words 
as nonterminal nodes. 
This premise is justified, because if different types 
of information were processed at different speeds, then 
tThe rate of n words versus n - 1 nonterminals does not 
precisely represent the constant relative speed, but the dis- 
crepancy here is least possible and thus acceptable enough as 
approximat ion. 
150 2 
there would arise imbalance of information distribu- 
tion across the corresponding different domains of in- 
formation. Such imhalance should invoke information 
flow from the domains with higher density to the do- 
mains with lower density of information distribution, 
when, as in the case of language, those domains of in- 
formation are tightly related with each other. That is, 
information flow eliminates such imbalance, resulting 
in approximately the same speed of processing across 
different but closely related domains of information. 
Now that we have worked out how many nodes a 
maximal structure includes, what is left is which nodes 
it includes. Let us refer to A in (5) as the current ac- 
tive word and the path from the root node S to the 
current active word as the current active path. It is 
natural to consider that a maximal structure includes 
the nodes to the left of the current active path, be- 
cause all the words they dominate have already been 
proce,;sed. Thus we come up with the above formula- 
tion of a maximal structure, if we notice that the nodes 
on the solid-line part (including Ai) of the current ac- 
tive path in (5) are adjacent to nodes to the left of the 
current active path, whereas the other nodes on the 
current active path (those on the dotted lines, includ- 
ing Bi) do not except for the mother of A, which will 
be processed at the next moment. 
4 Immediate Processing 
According to this model, any word should be in,me- 
diately processed, particularly in parsing, in the sense 
that corresponding amount of syntactic and semantic 
structure is tailored with little delay. The intrasen- 
tential status of a word is hence identified as soon as 
it is encountered. This contrasts with the determinist 
accounts which ,'assume lookahead to deal with local 
ambiguity. 
Empirical evidences support our position. In 
Marslen-Wilson's \[13\] experiment, for instance, the 
subjects were asked to listen to a tape-recorded utter- 
ance and to say aloud what they hear with the short- 
est possible delay. Some subjects performed this task 
with a lag of only about one syllable, and yet their er- 
ror reflected both syntactic and semantic context. For 
example, one of such a subjects said lie had heard that 
the Brigade ... upon listening to He had heard at the 
Brigade .... Such a phenomenon cannot be accounted 
for in terms of the determinist accounts with fixed pars- 
ing procedures. In our model, it is explained by just 
assuming that only the most active maximal structure 
tailored by the subject survives the experimental situ- 
ation. 
5 ~.~ansient Memory Load 
By transient memory load (TML) we refer to tile 
amount of linguistic information temporarily stored 
in STM. The measurements of TML during sentence 
processing proposed so far include the depth of center 
embedding (CE) \[5\] and that of self embedding (SE) 
\[15\]. A syntactic constituent a is centeroembedded in 
another syntactic constituent /3 when /3 = -rc~5 for 
some non-null strings 7 and £ We further say that c, 
is self-embedded in /3 when they are of the same sort 
of category, say NP. 
However, neither CE nor SE can explain why (6) is 
much easier to understand than (7). 
(6) 2bm knows the story that a man who lived in 
Helsinki and his wife were poor but they were 
happy. 
(7) Tom knows that the story on the fact that the 
rumor that Mary killed John was false is funny. 
Note that these sentences are of about the same length; 
The former consists of 20 words and the latter 19 
words. Almost all my informants (including both na- 
tive and non-native speakers of English) reported that 
(6) is easier to understand than (7). Those who felt 
contrariwise ascribed the difficulty of (6) to the ambi- 
guity concerning the overall structure of the cornple- 
meI~t clause after that. 
The approach based on CE fails to account for this 
difference, because the maximum CE depth of (6) a:.d 
that of (7) are both 3, as is shown below. 
(8) \[0Tom knows the story that \[la man \[2 who 
\[3lived\] in Helsinki\] and his wife were poor\] but 
they were happy\] 
(9) \[0 Tom knows that \[~ the story on the fact that 
\[2 the rumor that Mary \[a killed\] John\] was false\] 
is funny\] 
The maximum SE depth cannot distinguish these sen- 
tences: 
(10) Tom knows \[NPo tile story that \[NP~ a man who 
lived in \[NP~ Helsinki\] and his wife\] were poor but 
they were happy\] 
(11) Tom knows that \[NP0 the story on the fact that 
\[NP, the rumor that \[NP2 Mary\] killed John\] was 
false\] is funny. 
Our model provides a TML measure which accounts 
for the contrast in question. In order to plug a maximal 
structure with the rest of the sentence in a grammati- 
cal manner, one must remember only the information 
contained in the categories o11 the border between the 
maximal structure and the remaining context; i.e., cat- 
egories Ai, the mother of Bi (1 ~ i _< d) and A in 
(5). Thus the value of d in (5) could serve as a TML 
measure. As is illustrated in (12) and (13), in fact, 
the maximum of d is 2 and 3 for (6) and (7), respec- 
tively, explaining why (6)is easier. In (12) and (13), 
enclosed in boxes are the nodes corresponding to A,, 
Bi(1 < i < d) and A when d is ttle maximum; i.e., 2 
in the former and 3 in tile latter. 
151 3 
(~-) 
NP \[ 
Tom 
VPo 
V NPo 
knows NP So 
the story Co 
that S~ but 
NPt VP 
P were poor 
NP St his wife 
a man Cornp Sa 
wLo 
S 
they were happy 
lived P NP2 
I I in Helsinki 
(13) 
NP 
t Tom 
VPo 
V~go Aw+ ooi  
22 
PP is funny 
p NP 
on N $1 
the fact Comp 
tl P 
NP $2 was false 
the rumor Comp Sa 
NP2 
Mary NP L 
killed John 
NP 
the story 
152 4 
6 Language Acquisition 
The Dutch language exhibits a type of cross-serial de- 
pendency (CSD) in subordinate clauses: 
(14) ...dat Wolf de kinderen Marie 
... that Wolf the children Marie 
zag helpen zwemmen 
see-PAST help-INF swim-INF 
'... that Wolf saw the children help Marie swim' 
Our .theory predicts that children learning Dutch come 
to recognize the CSD constructions "as having the fol- 
lowing structure, which coincides with the structure 
figured out by Bresnan et al. \[4\] ~ based on an analysis 
of adult language. 
(15) S 
NP0 VP 
Xo Zo 
//"...... ..t/./......... 
N P ~ '"... Vo ..... .... "%. .. 
%. '. % 
Xm_ ~ Z,,_ a /'-,, 
NP,,-1 NP,, V,,:1 V, 
ttere Vo is a finite verb and V; is an infinite verb for 
1 < i < n. Vi is a causative verb or a perception 
verb for 1 < i < n. NPl is the subject of Vi for 
0 < i < n, and NPl is an object of V, forn < i <_ m 
(m > m). Note that NP~,...NP,, and V0,"'V~ 
constitute right-branching structures dominated by X0 
and Zo, respectively. 
Let us look at how a child regard a simple CSD con- 
struction (16) to be (17), which is an instance of (15) 
for m = n = 1. 
(16) ,.. dat Wolf/vlarie zag zwemmen 
... that Wolf Marie see-PAST swim-INF 
L.. that Wolf saw Marie swim' 
(17) s 
NPo VP 
W!lf NPI Zo I /""- 
Marie V0 Va 
I I zag zwemmen 
According to our model, the relevant part of the most 
active maximal structure would look like the following 
2(15) is slightly different from the structure proposed by Bres- 
nan et al., because we regard a sentence structure as a binary 
tree whereas their proposal involves tertiary branching obtained 
by equating VP and X0 in (15). This difference is irrelevant to 
the essence of the following disc.ssion. 
when zag has just been acknowledged, provided that 
the child has already acquired the standard structure 
of a subordinate clause, in which the finite verb appears 
at the end. 
(18) S 
NPo VPo 
I 1 Wol 
f V P 1 
NPt 
I Marie 
Z0 
Vo 
I zag 
VPo, VP1, Zo and Vo correspond to B,~-t, Aa, Bd and 
A in (5), respectively (so that VPo and Zo are not 
included in the maximal structure here). When zwem- 
men is encountered, category \[v, zwemmen\] must be 
inserted either between VPo and VPI or between Zo 
and Vo. In the alleged subordinate clause construe r- 
tion, Zo (which might be identical to Vo) has a direct 
access to \[NPj Marie\], which is the object of zag, the al- 
leged head of Zo. On the other hand, VP1 lacks such an 
access, because the relationship between Marie and zag 
is established not through but under VP~. It is hence 
more preferable that \[v~ zwemmen\] attaches beneath 
Zo, if the child has already perceived extralingulsti- 
eally the situation being described, in which Marie is 
swhnming. Now the most active maximal structure 
should look like this (Zo and Z1 are excluded from this 
maximal structure if they are distinct from Yo and V1, 
respectively): 
(19) zo 
Yo 
Vo Zt 
zag Va 
I zwemmen 
(17) is ttms obtained by setting VPo = VPh Zo = Yo, 
attd Zl = Vl. 
Note that this reasoning essentially relies oil our for- 
mulation of a maximal process. If a bottom-up model 
were assumed instead, for instance, there would be no 
immediate reason to exclude a structure, say, as fol- 
lows. 
(2o) S 
NPo VP 
I Wolf U V1 
NP1 Vo zwemmen 
Marie zag 
5 153 
The above discussion can be extended to cover more 
complex cases (where m > 1 in (15)) in a rather 
straightforward manner, as is discussed by Hasida \[6\]. 
The structure under Xo is tailored as a natural ex- 
tension of the way an ordinary subordinate clause is 
processed, then Vo is inserted beneath VP, following 
the ordinary structure of a subordinate clause together 
with the semantic information about the situation de- 
scribed, and Vi attaches near to Vi-~ for 1 < i < n 
due to the semantic information again. The structure 
under Z0 must be right-branching so that V0 be the 
head of VP. 
Also by reference to the current model, Hasida \[7\] 
further gives an account of the unacceptability of some 
unbounded dependency constructions in English which 
is hard to explain in static terms of linguistics. 
7 Concluding Remarks 
We have begun with a general constralnt-based per- 
spective about the cognitive mechanism, and shown 
that a model of sentence processing derived thereof, 
neutral between comprehension and production, ac- 
counts for several linguistic phenomena seemingly un- 
related to each other. It has thus been demonstrated 
that the speculation to derive the model has empir- 
ical supports, lending justification for the constraint 
p~radigm. In particular, our theory has been shown 
to be more adequate than the determinist approach, 
which must postulate a procedural design of the hu- 
man language faculty. 
A computational formalization of our model will be 
possible in terms of constraint programming, as dis- 
cussed by Hasida et al. \[8, 9, 17\]. Most of the time, 
a natural language processing system in terms of pro- 
cedural programming has been designed to be a series 
of a syntactic analysis procedure, a semantic analysis 
procedure, a pragmatic analysis procedure, and so on, 
in order to reflect the modularity of the underlying 
constraints. }towever, such a design imposes a strong 
limitation on information flow, restricting the system's 
ability to a very narrow range of context. One natu- 
rally attempts to remedy this so as to, say, enable the 
syntactic analysis module to refer to semantic infor- 
mation, but this attempt must destroy the modularity 
of the entire design, ending up with a program too 
complicated to extend or even maintain. Constraint 
paradigm seems to be the only way out of this diffi- 
culty. 

References 

\[1\] Aho, A. V. and Ullman, U. D. (1972) The Theory 
of Parsing, Translation and Compiling, Prentice- 
Hall. 

\[2\] Berwick, R. C. and Weinberg, A. (1984) The 
Grammatical Basis of Linguistic Performance, 
MIT Press. 

\[3\] Berwick, R. (1985) The Acquisition of Syntactic 
Knowledge, MIT Press. 

\[4\] Bresnan, J. Kaplan, R. M., Peters, S. and Zaenen, 
A. (1982) 'Cross-serlal Dependencies in Dutch,' 
Linguistic Inquiry, Vol. 13, pp. 613-635. 

\[5\] Church, K. W. (1980) On Memory Limitations in 
Natural Language Processing, MIT/LCS/TR-245, 
Laboratory for Computer Science, Massachusetts 
Institute of Technology. 

\[6\] tIasida, K. (1985) Bounded Parallelism: A Theory 
of Linguistic Performance, doctoral dissertation, 
University of Tokyo. 

\[7\] Haslda, K. (1988) 'A Cognitive Account of Un- 
bounded Dependency,' in Proceedings of COL- 
ING'88, pp. 231-236. 

\[8\] Hasida, K. (1989) A Constraint-Based View of 
Language, presented at the F~rst Conference on 
Situation Theory and its Applications. 

\[9\] ttaslda, K. and Ishlzaki, S. (1987)'Dependency 
Propagation: A Unified Theory of Sentence Com- 
prehension and Generation,' Proceedings of IJ- 
CAI'87, pp. 664-670. 

\[10\] Kaplan, R. M. (1972) 'Augmented Transition Net- 
works as Psychological Models of Sentence Com- 
prehension,' Artificial Intelligence, Vol. 3, pp. 77-100. 

\[11\] Kay, M. (1985) 'Parsing in Functional Unifica- 
tion Grammar,' in Dowty, D., Karttunen, L. 
and Zwicky, A. M. (eds.) Natural Language Pars- 
ing: Psychological, Computational, and Theoreti- 
cal Perspectives, Cambridge University Press. 

\[12\] Marcus, M. P. (1980) A Theory of Syntactic 
Recognition for Natural Language, MIT Press. 

\[13\] Marslen-Wilson, W. D. (1975) 'Sentence Percep- 
tion as an Interactive Parallel Process,' Science, 
Vol. 189, pp. 226-228. 

\[14\] McDonald, D. (1980) Natural Language Pro- 
duction as a Process of Decision Making un- 
der Constraint, Doctoral Dissertation, Laboratory 
of Computer Science, Massachusetts Institute of 
Technology. 

\[15\] Miller, G. A. and Chomsky, N. (1963) 'Finitary 
Models of Language Users,' in Luee, R. D., Bush, 
R. K., and Galanter, E. tlandbook of Mathematical 
Psychology, Vol. lI, pp. 419-491, John Wiley and 
Sons. 

\[16\] Shieber, S. M. (1988) 'A Uniform Architecture for 
Parsing and Generation,' in Proceedings of COL- 
ING'SS, pp. 614-619. 

\[17\] Tuda, H., Hasida, K., and Sirai, H. (1989) 'JPSG 
Parser on Constraint Logic Programming,' Pro- 
ceedings of the European Chapter of ACL'89. 
