CONTROL STRUCTURES AND THEORIES OF INTERACTION 
IN SPEECII UNDEP~.WI'ANDING SYSTEMS 
E.J. Briscoe and B.K. Boguraev 
University of Cambridge, Computer Laboratory 
Corn Exchange Street, Cambridge CB2 3QG, England 
ABSTRACT 
lr: this paper, we approach the problem of organisation 
and control ip. automatic speech understanding systems 
firaT.ly, by presentin~ a theory of the non-serial 
interactions "~eces';ary between two processors in the 
system; namely, the morphosyntaetic and the prosodic, 
and secondly, by showing how, when generalised, this 
theory allows one to specify a highly efficient 
architecture for a speech understanding system with a 
simple control structure and genuinely independent 
components. The theory of non-serial interactions we 
present predicts that speech is temporally organised in 
a very specific way; that is, tee system would not 
function effectively if the temporal distribution of 
various types of information in speech were different. 
The architecture we propose is developed from a study 
of the task of speech, unde:standing and, furthermore, is 
specific to this task. Consequently, the paper argues 
that general problem solving methods are unnecessary 
for speech understanding. 
! INTRODUCTION 
\]t is generally accepted that (he control structures of 
speech understanding systems (SUSs) must allow for 
non-serial interactions between different knowledge 
sources or components within the system. By r, on- 
serial interaction (NS1) we refer to communication 
which extends beyond the normal, serial, flow of 
information entailed by the tasks undertaken by each 
component. For example, the output of the word 
recognition system will provide the input to 
morphosyntactic analysis, almost by definition; 
however, the operation of the morpho.~yntaetic 
anaiyser .,~hould be constrained on some occasions by 
prosodic cues: say, that he:" is accented and followed 
by a "pause". whil,':'.t dog is not, in 
(1) Max gave her dog b4-';cuits. 
Similarly, the output of the morphosyntactic analyser 
will provide the input to scrnantie analysis, but on 
occasion, the operation of the rnorphosyntacLic 
analyser will be more efficient if it has access to 
information about the discourse: say, that the horse 
has no unique referent ip, 
(2) "/he horse raced past the barn fell, 
because this information will facilitate the reduced 
relative interpretation (see Crain & Steedman, in 
press). Thus, NSIs will be required between 
components which occur both before and after the 
morphosyntactie analyser in the serial chain of 
processors which constitute the complete SUS. 
NSls can be captured in a strictly serial, 
hierarchical model, in which the flow of information is 
always "upwards", by computing every possibility 
compatible with the input at each level of processing. 
However, this will involve much unnecessary 
computation within each separate component which 
could be avoided by utilising information already 
ten,:;orally available in the signal or context of 
utterance, \]::ut net part of the input to that level. An 
alternative architecture is the heterarchical system; 
this avoids such inefficiency, in principle, by allowing 
each component, to communicate with all other 
components in the system. However, controlling the 
flow of information and specifying the interfaces 
between components in such systems has proved very 
difficult (Rcddy & Erman, 1975). The most 
sophisticated SUS architecture to date is the 
blackboard model (Erman at a!., 1980). The model 
provides a means for common representation and a 
global database for communication between 
components and allows control of the system to be 
eentralised and relatively independent of individual 
components. The four essential elements of the model 
blackboard entries, knowledge sources, the 
blackboard and an intelligent control mechanism - 
interact t.o emulate a problem solving style that is 
charactemsticatly incremental and opportunistic. NSIs 
arc thus allowed to occur, in principle, when they will 
be of greatest value for preventing unnecessary 
computation. 
What is striking about these system architectures 
is that they place no limlts on the kinds of interaction 
which occur between component.% that is. none of 
them are based on any theory of what kind of 
interactions and eomrnunication will be needed in a 
SUS. The designers of tile Hearsay-ll system were 
exphcit about this, arguing that. what was required 
was an architecture capable of supporting ally form of 
interaction, but which was still relatively efficient 
(Erman & Lesser, 1975:484). 
259 
qhcrc appear to bc al least two problems with such an 
approach Fir.~tly. the designer of an mdivMua\] 
con'.pe~lent must stdl take ml.o account whmh other 
components should be activated by its outputs, as well 
as who prey,des ~ts inputs, precmcly because no 
prmc~plcs of interaction are provided by the model. This 
entails, even within the loosely structured aggregation 
hierarchy of the blackboard, some commttment to 
deci'.;ions about inter-component traffic in information - 
rational answers to these decismns cannot be provided 
without a theory of mteractmn between individual 
components in a SUS. 
Secondly. a considerable amount of effort has gone 
into specifying global scheduling heuristics for 
maintaining an agenda of knowledge sourcc activation 
records m blackboard system~, and this has sometimes 
led to treating the control problem as a distinct issue 
independent of the don-~ain under consideration, 
localismg it on a scparatc, schcdu\]ing, blackboard 
(I\]alzcr, Errnan and London, t980; Haycs-Roth, 1983a). 
Once again, this is because the blackboard framework, 
as iL is defined, provides no inherent constraints on 
mtcractions (|tayes-Hoth, 1983b). While this means that 
the model is powerful enough to replicate control 
strategies used in qualitatively different. AI systems, as 
well as generatise to problem-solwng in multiple domains 
(}laycs-I,:oth, 1983a), the blackboard method of control 
still fails to provide a complete answer to the scheduling 
problem. It is intended predommantty for solving 
problems whose solutien depends on heuristics which 
must cope with large volumes of nmsy data. 
In the context of a blackboard-based SUS, where 
the assumptmn that the formation of the "correct" 
interpretation of an input signal will, mevitably, be 
accompanied hy the generatmn of many competing 
(partial) mterprctatmns is Impiicit m the redundancy 
encoded in the individual knowledge sources, the only 
real and practical answer to the control problem 
remains the development of global strategies to keep 
unnecessary computatmn within practical limits. These 
stratcgms are developed by tuning the system on the 
basis of performance critema: this tuareg appears to 
hmlt interactions to just. those optimal cases which are 
likely to yield successful analyses, tlowever, msofar as 
the fmal system might claim to embody a theory about 
~hicil int,-,ractions are useful, this will never be 
represented in an explicit form in the loosely structured 
syzt.cm components, but only implimtly in the the run- 
time behaviour of the whole system: and therefore is 
unlikely to be rceow.'rable (see the analogous criticism in 
\]Iayes-l~.oth, 1983a:55). 
I INTERACTIVE DETERMINISM: 
A THEORY OF NON-SERIAL INTERACTION 
In this section, we concentrate on the study of NSI 
between morphosyntactm and prosodic information in 
specch, largely from the perspective of 
morphosyntactic analysis. This interaction occurs 
between two of the better understood components of a 
SUS and therefore seems an appropriate starting point 
for the development of a theory of NSIs. 
Lea (1950) argues that prosodic information will 
be of use for morphosyntaetic processing. This 
dmcussion is bascd on the observation (see Cooper & 
Paccia-Cooper, 1980; Cooper & Sorenson, 1981), that 
there is a strong correlation between some syntactic 
boundaries and prosodic effects such as lengthening, 
step up in fundamental frequency, changes of 
amplitude and, sometimes, pausing. However, many of 
these effects are probably irrelevant to 
morphosyntactic analysis, being, for example, side 
effects of production, such as planning, hesitation, 
afterthorghts, false starts, and so forth. If prosody is 
to be utilised effectively to facilitate morphosyntactic 
analysis, then we rcqmre a theory eapab!c of 
indicating when an ambiguous prosodic cue such as 
lengthening is a consequence of syntactic environment 
and, therefore, relevant to morphosyntactie analysis. 
None of tea's proposals make this distinction. 
In order to develop such a theory, we require a 
precise account of morphosyntactie analysm embedded 
in a model of a SUS which specifies the nature of the 
NSIs available to the morphosyntaetie analyser 
Conmdcr a simple modular architecture of a SUS m 
which most informatmn flows upwards through each 
lcvel of processing, as in the serial, hierarchical 
model This information is passed without delay, so 
any operation performed by a processor will be passed 
up to its successor m the cham of processors 
immediately (see Fig. l). 
Furthermore, we constrain the model as follows: 
at least from the point of word recognition upwards, 
only one interpretation is computed at each level. 
That is, word recognition returns a series of unique, 
correct words, then morphosyntactic analysis provides 
the unique, correct grammatical description of these 
words, and so forth. In order to implement such a 
constraint on the processmg, the model includes, in 
addition to the primary flow of information, secondary 
channels of commumcation which provide for the NSIs 
(represented by stogie arrows tn the diagram). These 
interactive channels are bidirectional, allowing one 
component to request certain highly restrtcted kinds 
of information from another component and, in 
principle, can connect any pair of processors in a 
SUS 
260 
DISCOURSE\[ <-~ 
\[ SEMANTICS I 
O" 
PARSE '~---J 
4> 
WORDS 
'~1 PROSODY I 
Fig. 1 
imagine a morphosyntactie analyser which builds 
a unique structure without backtracking and employs 
no, or very little, look-ahead Such a parser will face a 
ehmce point, irresolvable morphcsyntaetically, almost 
every time it encounters a structural ambiguity, 
whether local or global Further, suppose that this 
parser seeks to apply some general strategies to 
resolve such choices, that is, to select a particular 
grammatical interpretation when faced with ambiguity. 
If such a parser m to be able to operate 
dcterrninlstically, and still return the correct analysis 
without errer, m cases when a general strategy would 
yield the wrong analysis, then it will require 
interactive channels for transmitting a signal capable 
of blocking the application of the strategy and forcing 
the correct analysis. These are the secondary 
channels of communication posited in the model of the 
SUS above. 
A theory of NSls should specify when, in terms 
of the operation of any individual processor, 
interaction will be necessary; interactive channels for 
this parser must be capable of providing this 
information at the onset of any given 
morphosyntaetic ambiguity, which is defined as the 
point at which the parser will have to apply its 
resolution strategy. In order to make the concept of 
onset of ambiguity precise a model of the 
This diagram is not intended to be complete and is 
only included to illustrate the two different types 
of communication proposed in this paper. 
morphosyntactic component of a SUS was designed 
and implemented. This analyser (henceforth the 
LEXieal-CATegorial parser - because it employs an 
Extended Categorial Grammar (eg. Ades & Steedman, 
1982) representing morphosyntactic information as an 
extension of the lexicon) makes specific predictions 
about the temporal availability of non-morphosyntactie 
information crucial to the theory of NSls presented 
here. LEXICAT's strategy for resolution of ambiguities 
is approximately a combination of late closure 
(Frazier, 1979) and right association (Kimball, 1973). 
LEXICAT is a species of shift-reduce parser which 
ernp~oys the same stack for the storage and analysis 
of input and inspects the top three cells of the stack 
before each parsing operation. Reduction, however, 
never involves more than two ee'.ls, so the top cell of 
the stack acts as a very restricted one word look- 
ahead buffer. In general, LEXICAT reduces the items in 
cells two and three provided thai. reduction between 
cells one and two is not grammatically possible*. 
;Yhen LEXICAT encounters ambiguity, in the 
majority of situations this surfaces as a choice 
between shifting and reducing. When a shift-reduce 
ehmce arises between either cells one and two or two 
and three, reduction will be preferred by default; 
although, of course, a set of interactive requests will 
be generated at the point when thin choice arises, and 
these may provide information which blocks the 
preferred strategy. The approximate effect of the 
preference for reduction is that incoming material is 
attached to the constituent currently under analysis 
which is "lowest" in the phrase structure tree. LEXICAT 
is mrnilar to recent proposals by Church (1980), 
i:'ercira (in press) and Shieber (1983), in that it 
employs general strategies, stated in terms of the 
parser's basic operations, in order to parse 
determinislieally with an ambiguous grammar. 
A theory of NSls should also specify how 
interaction occurs. When LEXICAT recogniscs a choice 
point, it makes a request for non-morphosyntactic 
information relevant to this thrace on all of the 
interactive channels to which it is connected; if any of 
these channels returns a positive response, the 
default interpretation is overridden. The parser is 
therefore agnostic concerning which channel might 
provide the relevant information; for example, 
analysing 
(3) ha fore the King rides h~:s horse 
it's :tsually groomed. 
The onset of this rnorphosyntactic ambiguity arises 
when the horse has bcen analysed as a noun phrase. 
LEXICAT must decide at this point whether Tides is to 
be treated as transitive or intransitive: the transitive 
....................... 
This is not completely accurate; see 
1984:Ch3 fer a full description of LEXICAT. 
E~riszoe 
261 
reading Is preferred given the rcsnluLion strategy 
outlin(,.d above. "(herefore, an interactive request will 
be generated reque:~tin~ information concerning the 
rcP:tmnship between these two constituents. A simple 
yes/no rcsponse is all that m needed along this 
interactive channei: "yes" to prevent appl;.cation of the 
strategy, "no" if the processor concerned finds 
nothing relevant to the decision. In relation to this 
example, consider the channel to the prosodic 
analyser which monitors for prosodic "breaks" (defined 
in terms or vowel lengthening, change of fundamental 
frequency and so forth): whcn the request is rcecivcd 
the prosodic analyscr returns a positive response if 
such a break is prcscnt in the appropriate part of the 
speech signal. In (3) none of these cues is likely to 
occur since t.hc rclcvant boundary is syntactically 
wcak (see Cooper & Paecm-Coopcr, 1980), so the 
interactive request will not rcsu!t in a positive 
response, the default resolution strategy will apply 
and his horse will bc intcrprctcd as direct object of 
rides. In 
(4) \[Tefore the h~ng rides his horse 
is usually groomed, 
cn the ether hand, an interactive request will be 
generated at the same point, but the interactive 
channel between the prosodic and morphosyntactic 
components is likely to produce a positive response 
since the boundary between rides end his horse is 
synLactically sLrongcr. Thus, altachment will be 
blocked, closing the subordinate clause, and thereby 
forcing the correct interpretation. 
NSI ,then, is restricted to a set. of yes/no 
responses over the interactive channels at the 
explicit. :'equcst of the processor connected to those 
channels, where a positive response on one interactive 
channel suffices to override th:~ unmarked choice 
which would be made in the absence of such a signal. 
This highly restricted form of interaction is :;ufficient 
to guarantee that I,EXICAT will proouce the correct 
analysis even in cases of severe muttiplc ambiguity; 
for example, ,Jnalymng the noun compound in 
(b)lioron epoxy rocket motor chambers, 
(from Mareu:~, \[980:253), th(:rc are fourteen + licit 
morph:~syntactm interpretations, assuming standard 
gramrnat.ical analyses (eg. Sell{irk, t983). However, if 
this example were spoken and we assume that it would 
have the prosodic structure predicted by Cooper & 
Paceia-Cooper's (1980) algorithm for deriving prosody 
..................... 
Possibly Lhese responses shon!d be represented as 
confidence ratings rather Lhan a discrete choice. 
In this case levels of certainty concerning the 
prcscnce/absencc of relevant events cculd be 
rvpre~i'ntcd, llowcver, for tim rest of ~.his paper we 
assume binary channels wi!! suffice. 
+ Corresponding to the Catalan numbers; see Martin 
eL al. (198l). 
from syntactic structure, LEXICAT could produce the 
correct analyms without error, just through 
interaction with the prosodic analyser. As each noun 
enters the ar,alyser, reduction will be blocked by the 
general strategy but, because LEXICAT will reeognise 
the existence of ambLguity, an interactive request will 
be generated before each shift. The prosodic break 
channel will then prevent reduction after epoxy and 
after ~otor, forcing the correct analysis ((boron 
epoxy) ((rocket motor) chambers)), as opposed to the 
default right-branching structure. 
Thus, NSI between the morphosyntaetie and 
prosodic components can be captured by a bistable, 
bidirectional link capable of transmitting a request 
and signaling a binary reponse, either blocking or 
allowing the application of the relevant strategy 
according to the presence or absence of a prosodic 
break. Given the simplicity of this interaction, the 
prosodic analyser requires no more information from 
Lhe parser than that a decision is requested 
concerning a particular boundary. Nor need the 
prosodic analyser decide, prior to an interactive 
request on this channel, whether a particular 
occurrence of, say lengthening, is signalling the 
presence of a prosodic break, rather than for instance 
stress, since the request itself will help resolve the 
interpretation of the cue. Moreover, we have a simple 
generalisation about when inLeractive requests will be 
made since Lhis account of NSIs predicts that prosodic 
infermatmn will only be relevant to morphosyntaetic 
analysis at the onset of a morphosyntactic ambiguity. 
If we assume (boldly) that this account of NSI 
bcLween the morphosyntaetie and prosodic analysers 
will generalisc to a complete model of SUS, then such 
a model rnakcs a set of predictions concerning the 
temporal availability of interacQvc information in the 
speech signal and representaQon of the context of 
utterance. In effect, it claims that the SUS 
architecture simply presupposes that language is 
organiscd Jil the appropriate fashion since the model 
will not. function if it is not. We call this strong 
prediction about the temporal organisation of the 
speech signal the Interactive Determinism (ID) 
Hypothes,s since it is essenQally an extension of 
Marcus' (1980) Determinism Hypothesis. 
II TESTING 
THE INTERACTIVE DETERMINISM HYPOTttESIS 
The ID hypothesis predicts th,~t speech and the 
represcntation of context is organiscd in such a way 
that. information will be available, when needed, vza 
NSI Lo resolve a choice in any individual component at 
the point when that choice arises. Thus m the case of 
prosodic interaction with morphosyntaetie analysis the 
theory predicts that a prosodic break should be 
present in speech at the onset of a morphosyntaetie 
262 
ambiguity which requires a non-default interpretation 
and which is not resolved by other non- 
morphosyntactic information. This aspect of the ID 
hypothesis has been tested and corroborated by Paul 
Warren (1983; in prep; also see Briscoe, 1984:Ch4), 
who has undertaken a series of speech production 
experiments in which (typically) ten subjects read 
aloud a list of sentences. This list contains sets of 
pairs of locally ambiguous sentences, and some filler 
sentences so that the purpose of the experiment is 
not apparent to the subjects. Their productions arc 
analysed acoustically and the results of this analysis 
arc then checked statistically. The technique gives a 
good indicatio~ of whether the cues associated with a 
prosodic break are present at the appropriate points 
in the speech signal, and their cons,,stency across 
different speakers. 
Returning to examples (3) and (4) above, we 
noted that a prosodic break would be required in (4), 
but not (3), to prevent attachment of rides and hzs 
horse. Warren found exactly this pattern of results; 
the duration of rides (and similar items in this 
position) is an average 51% longer in (4) and the fall 
in fundamental frequency is almost twice as great with 
a corresponding step up to horse, as compared to a 
smooth declination across this boundary in (3). 
Similarly, analysing 
(6) 7he company awarded the contract 
\[to/was\] the highest bidcler. 
I,E),qCAT prefers attachment of The company to 
awarded, treating awarded as the main verb. In the 
case where awarded must be treated as the beginning 
cf a reduced relative, Warren found that the duration 
of the final syllable of company is lengthened and that 
the same pattern of fall and step up in fundamental 
frequency occurs. Perhaps the mo'~t interesting cases 
are ambiguous constituent questmns; Church 
(19g0,117) argued that it is probably impossible to 
parse these dcterministieally by employing look-ahead: 
"The really hard problem with wh-movement is 
finding the "gap" where the wh-element 
originated. This is not particularly difficult for 
a non-deterministic competence theory, but it 
is (probably) impossible for a deterministic 
processing model." 
LEXICAT predicts that in a sentence such as 
(7) ~Vho did you want to give the presents to 5~.e? 
the potential point of attachment of Who as direct 
object of want will bc ignored by default in preference 
for the immediate attachment of to give. Thus there is 
a prediction that the sentence, when spoken, should 
contain a prosodic break at this point. Warren has 
found some evidence for this prediction, i.e. want is 
lengthened as compared to examples where this is not 
the correct point of attachment of the prcposed 
phrase, such as 
(8) Who did you want t.~ give the presents to? 
but the prosodic cues, although consistent, are 
comparatively weak, and it is not clear that listeners 
are utilising them in the manner predicted by the 
theory (see Briscoe, 1984:Ch4). 
A different kind of support is provided by 
sentences such as 
(9) Before the I~ng rides a servant 
grooms his horse. 
which exhibit the same local ambiguity as (3) and (,t) 
but where the semantic interpretation of the noun 
phrase makes the direct object reading implausible, in 
this case it is likely that an interactive channel 
between the semantic and morphosyntactlc analysers 
would block the incorrect interpretation. So there is a 
prediction that the functional load on prosodic 
information will decrease and, therefore, that the 
prosodic cues to the break may be less marked. This 
prediction was again corroborated by Warren who 
found that the prosodic break in examples such as (9) 
was significantly less rnarked acoustically than for 
c~arnplcs such as (4)*. In general then, these 
experimental results support the ID hypothesis. 
Ill CONTROl, STRUCI'URE AND ORGANISATION 
In a SU~J based on the ID model, the main flow of 
information will be defined by the tasks of each 
component, and their medium of communication, will 
be a natural consequence of these tasks; as for the 
serial, hierarchical model. However, in the ID model, 
unlike the hierarchical model, there arc less 
overheads because unnecessary computation at any 
icv(.l of processing will be eliminated by the NSIs 
between components. These interactions will, of 
course, require a large number of interactive 
channels; but these do not imply a common 
representation language because the information 
which passes along them is representation-independent 
and restricted to a minimal request and a binary 
response. Each channel in the full SUS will be 
dedicated to a specific interaction between 
components; so the morphosyntactie component will 
require a prosodic break channel and a unique 
referent channel (see example (1)), and so forth. 
Thus, a complete model of SUS will implement a theory 
of the types of NSI required between all components. 
Finally, the ID model will not require that any 
individual processor has knowledge of the nature of 
the operations of another processor; that is, the 
Note that this result is inexplicable for theories 
which attempt to derlve the prosodlc structure of a 
sentence directly from its syntactic structure; see 
Cooper 3: Paccia-Cooper (\].980:181f). 
263 
morphosyr:tacLic analyser need riot know what is being 
eoiT~puted at the other end of the prosodic break 
channel, or how; nor riced the p:'osodic analyser know 
why it is eomputin~ the presence or absence of a 
prosodic break. Rather, the knowledge that this 
infor'ma~lon is potentially important is expressed by 
the existence of this particular inLeractive channel. 
The control structure of this model is 
straightforward; after each separate operation of each 
individual c~mponent the results of this operation will 
be passed to the next component in the serial chain 
ol processors. An interactive request ~'ill be made by 
an}, component only when faced with an indeterminism 
irresolvable in "erms of the input available to it. No 
further scheduhng or eent.ralised control of processing 
will be reqmred. Furthermore, although each individual 
eomK.enent determines when .N3Is will occur, because 
of the restricted nature of this interaction each 
component can still be developed as a completely 
independent knowledge source. 
The deterministic nature of the individual 
component~ of this SUS eliminates the need for any 
glob,d hcurm!ies to be brought into the analysis o\[ the 
speech signal. Thus we have di--pensed neatly with the 
requirement for an over-powerful and over-general 
problem-solving framework, such as the blackboard, 
and replacr:d it with a theory specific to the domain 
under conmderalion; namely, language. The theory of 
X~q}s offers a uatisfaetory specific method for speech 
undci.-:tallding which allowrr the separate specialist 
c,~mpor;ent procedures of a SUS to be 
"a!Forithmetized'" and compiled. As Erman et al. 
(1980::L16) suggest: "In such a ease tile flexibility of a 
system like Hcarsay-ll may no longer be needed". 
"fhe restrictions on the nature and directionality 
of NSI ehanneis in a SUE:, and the situations in which 
they \[iced to be activated, a;Iowt; a modular system 
who'.~e control structure is not inuch more complex 
than th:.~t of the hierarchical mode}, and yet, via the 
net.work of interactive channels, achieves the 
efficiency sought 5y the heterarchieal and blackboard 
models, without the concomitant prcblems of common 
knowledge representations and complex 
eom!Tmni~zations protocols between separate knowledge 
sources. Thus, the ID mode! dispenses with the 
overhe.id costs of data-directed activation of 
'.mowledge sources and the need for opportunistic 
scheduling or a complex focus-of-control mechanism. 
IV CONCLUSION 
In this paper we have proposed a very idealised model 
of a SUS with a simple organisation and control 
structure, Clearly, the ID model assumes a greater 
level of understanding of many aspects of speech 
processing than is current. For example, we have 
assurncd that the word recognition component is 
capable of returning a series of unique, correct lexical 
items; even with interaction of the kind envisaged, it 
is doubtful that our current understanding of 
acoustic-phcnetic analysis is good enough for it to be 
possible to build such a component now. Nevertheless, 
ti.. experimental work reported by Marslcn-Wilson & 
Tyler (1980) and Cole & Jakimik (1980), for example, 
suggests that listeners are capable of accessing a 
unique Icxical item on the basis of the acoustic signal 
and interactive fcedback from the developing analysis 
of the utterance and its context (often before the 
acoustic signal is complete). More seriously, from the 
perspective of interactive determinism, little has been 
said about the many other interactive channels which 
will be required for speech understanding and, in 
particular, whether, these channels can be as 
restricted a.~: the prosodic break channel. For example, 
consider the channel which will be required to capture 
the interaction in example (9); this will need to be 
sensiLive to something like semantic "anomaly". 
tIowever, ?.emantic anomaly is an inherently vague 
concept, particularly by comparison with that of a 
prosodic break. Similarly, as we noted above, the 
morphosyntactic analyser will require an interactive 
channel to the discourse analyser which indieates 
whether a noun phrase followed by a potential relative 
clause, such as tar horse in (3), has a unique 
referent. However. since this ehannel would only seem 
to be relevant to ambiguities involving relative clauses, 
it appears to east doubt on the claim that interaetive 
requests are generated automatically on every channel 
each time any type of ambiguity is encountered. This, 
in turn, suggests that the control structure proposed 
in the last section is oversimplified. 
Nevertheless, by studying these tasks in terms of 
far more re,;trictcd and potentially more 
eomputationally efficient models, we are more likely to 
uncover restrictions on language which, once 
discovered, will take us a step closer to tractable 
solutions to the task of speech understanding. Thus, 
the work reported here suggests that language is 
organised in such a manner that morphosyntactic 
analysis can proceed detcrministically on the basis of 
a very restricted parsing algorithm, because non- 
structural information necessary to resolve 
ambiguities will be available in the speech signal (or 
representation of the context of utterance) at the 
point when the choice arises during mcrphosyntaetic 
analysis. 
Tile account of morphosyntactie analysis that 
thls constraint allows is more elegant, parsimonious 
264 
and empirically adequate than employing look-ahead 
(Marcus, 1980). Firstly, an account based on look- 
ahead is forced to claim that local and global 
ambiguities are resolved by different mechanisms 
(since the latter, by definition, cannot be resolved by 
the use of morphosyntaetic information further 
downstream in the signal), whilst the ID model 
requires only one mechanism. Secondly, restricted 
look-ahead fails to delimit accurately the class of so- 
called garden path sentences (Milne, 1982; Briscoe, 
1983), whilst the ID account correctly predicts their 
"interactive" nature (Briscoe, 1982, 1984; Crain & 
Steedman, in press). Thirdly, look-ahead involves 
delaying decisions, a strategy which is made 
implausible, at least in the context of speech 
understanding, by the body of experimental results 
summarised by Tyler (1981), which suggest that 
morphosynta:':tie analysis is extremely rapid. 
The generatisation of these results to a complete 
model of SUS represents commitment to a research 
programme which sets as its goal the discovery of 
const.raints on language which allow the associated 
processing tasks to bc implemented in an efficient and 
tractable manner What is advocated here, therefore, 
is the development of a computational theory of 
iangoage processing derived through the study of 
language from the perspective of these processing 
tasks, much in the ~ame way in whmh Marr (1982) 
developed his comput.ational theory of vision. 
Acknowledgements: We would like to thank David 
Carter, Jane Robinson, Karen Sparck Jones and John 
Tait for their helpful comments. Mistakes remain our 
own. 
V REFERENCES 
Ades,A. and Steedman,M.(1982) 'On the Order of 
Words', Linguistics and Philosophy, col.5, 320-363 
Balzer,R., Erman,L., London,P. and Williams,C.(1980) 
'HEARSAY-Ill: A Domain-Independent Framework for 
Expert Systems', Proceedings of the AAAI(1), 
SLanford, CA, pp. 108-110 
Briscoe,E.(1982) 'Garden Path Sentences or Garden 
Path Utterances?', Cambridge Papers in Phonetics 
and Experimental Lingui.~tics, vol.\], 1-9 
Briscoc,E.(1983) 'Determinism and its implementation 
m Parsifal' in Sparck Jones,K and Wilks,Y.(eds.), 
Automatic Natural Language Parsing, Ellis 
Horwood, Chichester, pp.61-68 
Briscoe,E.(1984) Towards an Understanding of Spoken 
Sentence Comprehension: The Interactive 
Determinism H~jpothesis, Doctoral Thesis, 
Cambridge University 
Church,K(1980) On Memory Limitations in Natural 
Language Processing, MIT/LCS/TR-245 
Cole,R and Jakimek,J.(1980) 'A Model of Speech 
Perception' in Cole,R.(eds ), Perception and 
Production of Fluent Speech, Lawrence Erlbaum, 
New Jersey 
Cooper,W. and Paccia-Cooper,J. (1980) 3yntax and 
Speech, Harvard University Press, Cambridge, Mass 
Cooper,W. and Sorenson,J.(1981) Pundamental 
Prequency in Sentence Production, Springer 
Verlag, New York 
Crain,S. and Steedman,M.(In press) 'On Not Being Led 
Up the Garden Path: the Use of Context by the 
Psychologmal Parser' in Dowty,D., Karttuncn,L 
and Zwicky,A.(eds.), Natural Language Processing, 
Cambridge University Press, Cambridge 
Erman,L, Hayes-Roth,F., Lesser,V. and Rcddy,R.(1980) 
'The tlearsay-II Speech Understanding System: 
Integrating Knowledge to Resolve Uncertainty', 
Computing Surveys, col. 12, 213-253 
Erman,L. and Lesser,V.(1975) 'A Multi-Level 
Organisation for Problem Solving Using Many, 
Diverse, Cooperating Sources of Knowledge', 
Proceedings of the 4th IJCAI, Tbilisi, Georgia, 
pp.d83-490 
Fra:'ier,L. (1979) On Comprehending Sentences: 
Syntactic Parsing 52rategies, IULC, Bloomington, 
Indiana 
}Iayes-Roth,B.(1983a) A Blackboard Model of Control, 
Report No.HPP-83-38, Department of Computer 
Science, Stanford University 
llayes-Roth,B.(1983b) 7he Blackboard Architecture: A 
General Framework for Problem Solving?, Report 
No HPP-83-30, Department. of Computer Science, 
Stanford University 
Kimbatl,J.(1973) 'Seven Principles of Surface Structure 
Parsing in Natural Language', Cognition, col.2, 15- 
47 
I,ea,W.(1980) 'Prosodic Aids to Speech Recognition' in 
W. l,ea(cds. ), Trends in Speech Recognition, 
Prentice Hall, New Jersey, pp 166-205 
Marcus,M.(1980) A Theory of S)jntactie Recognition for 
Natural I~nguage, MIT Press, Cambridge, Mass. 
Marr,D.(1982) V/sion, W.H.Freeman and Co., San 
Francisco 
Marslcn-Wdson,W. and Tyler,L.(1980) 'The Temporal 
Structure of Spoken \]_,anguagc Understanding: the 
Perception of Sentences and Words in Sentences', 
Cbgnition, col 8, 1-74 
Martin,W., Church,K. and Patil,R.(1982) Preliminary 
Analysis of a I3readth-F~rst Parsing Algorithm: 
Theoretical and Experimental Results, 
MIT / I,CS/TR- 261 
Milne,R.(1982) 'Predicting Garden Path Sentences', 
Cognitive Science, col.6, 349-373 
Percira,F.(\]n press) 'A New Characterization of 
Attachment Preferences' in Dowty,D., Karttunen,L. 
and Zwicky,A.(eds.), Natural I~nguage Processing, 
Cambridge University Press, Cambridge 
Selkwk,E.(1983) The Syntaz of Words, MIT Press, 
Cambridge Mass. 
Shieber,S (1983) 'Sentence Disambiguation by a Shift- 
265 
t,~ccltJ(',~ Par~irL.q Technique', I~'oceedings of th.e 
21.st A~.n.~zctl ,,~4eeti.ng of AC\[,. C~rnbridgc, Mass, 
pp 1 13-ilFJ 
t,~eddy,JL and Erman,\[,(197,5) 'Tutorial on System 
Organlsatlon for Speech Understanding' in 
R!{eddy(eds), Speech \[?ecogr~tior~" Invited Papers 
of th.e ll';J'."," .b~.qrrtpos'i.um. Academic Pre~s, New 
York, pp.,IbT- ,179 
'ryler,L.(1981) ',~er~ai and Interact lye-Parallel Theories 
of Sentence Proces~;ing', 7~eorelLcat \[,ir~g~zistics, 
vot.\[L 29-65 
War'ren,P.(19l\]3) 'Temporal and Non-Ternporal Cues to 
Sent.encc Structure'. 6"ctmbmdge Papers irL 
Phonetics ~nd I;zperimenta.l l,£r~guist£cs, vot.H 
Warren,P.(|n prep) lhzrational i;~ctors in 5~geech 
~5'ocessinE, Doctoral Thesis, Cambridge University 
266 
