A Finite-Slate Parser for Use in Speech Recognition 
Kenneth W. Church 
NE43-307 
Massachusetts Institute of Technology 
Cambridge, MA. 02139 
This paper is divided into two parts. 1 The first section motivates 
the application of finite-state parsing techniques at the phonetic level in 
order to exploit certain classes or" contextual constraints. -In the second 
section, the parsing framework is extended in order to account \['or 
'feature spreading' (i:.g., agreement and co-articulation) in a natural 
way. 
I. Parsing at the Phonetic Level 
It is well known that phonemcs have different acoustic/phonetic 
realizations depending on the context. Fur example, the phoneme/t/ 
is typically realized with a different allophone (phonetic variant) in 
syllable initial position than in syllable final position. In syllable initial 
position (e.g., Tom),/t/is almost always released (with a strong burst of 
energy) and aspirated (with h-like noise), whereas in syllable final 
position (e.g., cat.), /t/ is often unreleased and unaspirated_ It is 
common practice in speech research to distinguish acoustic/phonetic 
properties that vary a great deal with context (e.g., release and 
aspiration) from those that are relatively invariant to context (e.g., 
place, manner and voicing). 2 In the past, the emphasis has been on 
invariants; allophonic variation is traditionally seen as problematic for 
recognition. 
(I) "In most systems for sentence recognition, such modifications 
must be viewed as a kind of 'noise' that makes it more difficult 
to hypothesize lexical candidates given an input phonetic 
transcription. To see that this must be the case, we note that 
each phonological rule \[in an example to be presented below\] 
l, This research was ~pported (in part) by the National Institutes of I lealth Grant No. 1 
POt I M 03374-01 and 03374-02 from the National Library of Medicine, 
2. Place refers IO the location of the constriction in the vocal tracL Examples include: 
labial t'at the hpsl/p, b. f, ',. m/, velar/k, g. r~/, dental (at the teeth)/s, z, t. d, I, n/and 
palatal A, ;~, i:,'}/ Manner dislmgu~shes among vowels, liquids and slides (e.g., /1, r, y. 
w/t. fricatives le.s.,/s, z, f. v/t, nasals (e.g.,/n. m. rio and stops leg,/p, t, k, b, d, g/). 
Voietng (periodie ~,ibration of the vocal fold.s) distingmshes sounds like /b, d. S/ from 
sounds like/p, L, k./. 
results in irreversible ambiguity - the phonological rule does 
not have a unique inverse that cuuld be used to recover the 
underlying phonemic representation for a ie,xical item. l:or 
example .... schwa vowels could be the first vowel in a word like 
'about' or the surface realization of almost any English vowel 
appearing in a sufficiently destressed word. The tongue tlap \[El 
could have come from a /t/ or a /d/." Klatt (MIT) 
\[21, pp. 548-5491 
This view of allophonic variation is representative of much of the 
speech recognition literature, especially during the ARPA speech 
project. One can find similar statements by Cole and Jakim~k ICMU) 
\[5\] and by Jelinek (IBM)\[17\]. 
I prefer to think of variation as usefid. It is well known that atlo- 
phonic contrasts can be distinctive, as illustrated by the following 
famous minimal pairs where the crucial distinctions seem to lie in the 
allophonic realization of the/t/: 
(2at a tease / at ease aspirated / flapped 
(2b) night rate / ni-trate unreteased/retroflexed 
(2c) great wine / gray twine unreteased/rounded 
This evidence suggests that allophonic variation provides a tich source 
of constraints on syllable structure and word stress. The recognizer to 
be discussed here (and partly tmplcmented in Church \[4\]) is designed to 
exploit allophonic and phonotactic cues by parsing the input utterance 
into syllables and other suprasegmental constituents using phrase- 
structure parsing techniques. 
1.1 An Example of Lexical Retrieval 
It might be helpful to work out an example it\] order to illustrate 
how parsing can play a role in l.exica\] retrieval. Consider the phonetic 
transcription, mentioned above in the citation from Klatt \[20, p. 1346\] 
\[2\], pp. 548-549J: 
91 
(3) \[dD~hlf_lt) tam\] 
It is desired to decode (3) into the string ofwords: 
(4) Did you hit it to Tom? 
In practice, the lexical retrieval problem is complicated by errors in the 
front cad. However, even with an ideal error-free front-end, it is 
difficult to decode (3) because, among other things, there are extensive 
nile-governed changes affecting the way that words are pronounced in 
different sentence contexts, as Klatt's example illustrates: 
(5a) Pabtalization of/d/before/y/in didyou 
(5b) Reduction of unstressed/u/to schwa in),~u 
(5c) Flapping of intervocalic /t/ in hit. it 
(5d) Reduction of schwa and devoicing of/u/in to 
(5e) Reduc:ion of geminate/t/in it. to 
These allophonic processes often appear to neutralize phonemic 
distinctions. For example, the voicing contrast between/t/ and/d/. 
which is usually distinctive, is almost completely lost in wr~er/rid_er, 
where bod~ /t/ and /d/ are realized in American English with a tongue 
~ap (q. 
1.2 .\n Ogtimistic "v'icw of Neutralization 
Fortunately, there are many fewer cases of true neutralization 
than it might seem. Even in writ.er/ri~.er, the voicing contrast is not 
completely lost. The vowel in rider tends to be longer than the vowel in 
w~ter due to a general process that lengthens vowels before voiced 
consonants (e.g., /d/) and shortens them before unvoiced consonants 
(e.g.,/t/). 
A similar lengthening argument can be used to separate In/and 
/ndl (at least in some cases). It tmght be suggested that In/is merged 
with/nd/by a/d/deletion rule that applies in words like mena~ wind 
(noun). wind (',erbL and find. (Admittedly there is little if any direct 
acoustic evidence fi)r a/d/segment in this environment.) However, \[ 
suspect that these words can o)~en be distinguished from men, win. 
)vttte. and fine mostly on the basis of the duration of the nasal murmur 
which is lengthened in the precedence of a voiced obstruent like/d/. 
Thus, this /d/-detction process is probably not a true case of 
neutralization, 
Recent studies in acoustic/phonetics seem to indicate that more 
and more cases of apparent neutralization can be separated as the field 
progresses. For instance, it has been said that/s/merges with f~/in a 
context like ga~ shortage \[12\]. lh)we~cr, a recent experiment 1271 
suggests that the/s~/sequence can be distinguished from /~,~/ las in 
fisth shortage) on the basis of a spectral tilt: the /s,~/'spectrum is more 
/s/-like in the beginning and more/~,/-like at the cad, whereas the f~ 
spectrum is relatively constant throughout. A similar spectral tilt 
argument can be used to separate other cases of apparent gemination 
(e.g../z~'/in ~ the). 
As a final example of apparent ncutra!ization, consider the 
portion of the spectrogram in Figure !, between 0.85 and 1.1 seconds. 
This corresponds to the two adjacent /t/s in Did you hit it to Tom? 
Klatt analyzed this region with a single geminated/t/. However, upon 
further investigation of the spectrum, I believe that there are acoustic 
cues for two segments. Note especially the total energy, which displays 
two peaks at 0.95 and 1.02 seconds. On the basis of this evidence, I will 
replace Klatt's transcription (6a) with (6b): 
(6a) \[dl\]ahlf.lu taml 
(6b) \[dl\]i}hll'I t tlmml 
U 
1.3 Parsing and Matching 
Even though 1 might be able to re-interpret many cases of 
apparent neutralization, it remains extremely difficult to "undo" the 
allophonic rules by inverse transformational parsing techniques. Let 
me suggest an alternative proposal, l will treat syllable structure as an 
intermediate level of representation between the input segment lattice 
and ',he output word lattice. In so doing, I have replaced .:.he lexical 
retrieval problem with two (hopefully simpler) problems: (a) parse the 
segment lattice into syllable structure, and (b) match the resulting 
constituents a~ainst the lexicon. I will illustrate the approach with 
Fig. I. Did you hit it to Tom? ,-,~.(..~.) 
o,0 Pit oiZ . oi.~ 0.4 06 o.e 0.7 O.a 0.9 l.o 1.I t,Z 1.3 :.4 l.e 
as ,:~o'; Laer¢~ -- t~,6HIm76OH8 ......... 
-,o~ ..... --~-~-,;---~-'~- ;';' i'L " .... ;" ~'~'~:"~ 
,,ill , Igll,, , .I 
r 
dl 
i Wavetom ~ ~ ~IL . ~ ~,.. 
I ._ J.~ L , I', I .. t I , L -t_~! I -.1 L.\] I l I I 
Did you hit it to Tom 
92 
Klatt's example (enlu, nced with allophonic diacritics to show aspiration 
and glottalization): 
(7) \[drjighlff tht thaml 
TTr 
Using phonotactic and allophonic constraints on syllable structure such 
as: 3 
(8a) /h/is always syllable initial, phonotactic 
(8b) \[1" I is always syllable final, allophonic 
(8c) \[?\] is always syllable final, and allophonie 
(Sd) \[t h\] is always syllable initial, allophonic 
the parser can insert the following syllable boundaries: 
(9) \[di~} # hlf. # I ? # tht # tham\] 
It is now it is relatively easy to decode the utterance with lcxical 
matching routines similar to those in Smith's Noah program at CMU 
{241. 
parsed transcription, decodinl 
dl\]~ -...¢ did you 
hlf= --..* hit 
l ? -=+ it 
th) ---., to 
tham ---, Tom 
In summary, I believe that the lexical retrieval device will be in a 
superior position to hypothesize word candidates if it exploits allo- 
phonic and phonotactic constraints on syllable structure. 
1.4 Exploiting Redund:mey 
In many cases, atlophonic and phonotacdc constraints are 
redundant, Even if the parser should miss a few of the cues for syll~ibie 
structure, it will often be able to find the correct structure by taking 
advantage of some other redundam cue. \[:or example, suppose that the 
front end failed to notice die glottalized/t./in the word it. 
(10) dl\]i9 #hlf_# I #tha #tham 
T 
The parser could deduce that the input transcription (10) is internally 
inconsistent, because of a phonotactic constraint on the lax vowel/I/. 
3. This formulation of the eonst/'aints is oversimplified for exlx3,sltory convenience; see 
\[10. lJ. 15\] and references thereto for discussion of the more subtle issues. 
Lax vowels are restricted to closed syllables (sylkdgles ending in a 
consonant) \[I\]. However, in this case, /1/ cannot mcct the closed 
syllable restriction because the following consonant is aspirated (arid 
therefi)re syllable initial). Thus the transcription is internally 
inconsistent. The parser shotlld probably rejcct tbc transcriot;¢,n ~md 
hope that the front end can fix dxe problem. Alternatively, the parser 
might attempt to correct the error by hypothesizing a second/t/. 4 
There are many other examples like (10) where phonotactic 
constraints and allophonic constraints overlap. Consider the pairs 
found in figure 2, where there are multiple arguments for assigning the 
crucial syllable boundary. In de-prive vs. dep-rivalion, for instance, the 
difference is revealed by the vowel argument above 5 and by the 
aspiration rule. 6 In addition, the stress contrast will probably be cor- 
related with a number of so-called 'suprasegmental' cues, e.g., duration, 
fundamental frequency, and intensity \[81. 
In general, there seem to be a large number of multiple low level 
cues for syllable strt,cture. This observation, if correct, could be viewed 
as a form of a 'constituency hypothesis'. Just as syntacticians have 
argued for the constituent-hood of noun phrases, verb phrases and 
sentences on the grounds that these constituents seem to capture crucial 
linguistic generalizations (e.g., question formation, wh-movement), so 
too, I might argue (along with certain phonologists such as Kahn \[13\]) 
that syllables, onsets, and rhymes are constituents because they also 
capture important generalizations such as aspiration, tensing and laxing. 
If this constituency hypothesis for phonology is correct (and I believe 
Fig. 2. Some Structural Contrnsts 
r ! _w 
t2 de-prive 
dep-rivation 
t a-ttribute 
att-ribute 
li de-crease 
dec-riment 
b cele-bration 
celcb-rity 
d a-ddress 
add-tess 
g de-grade 
deg-radation 
di-plomacy 
dip-lumatic 
de-cline a-cquire 
dec-lination acq-uisition 
o-bligatory 
ob-ligation 
4. Personally. 1 favor the first alternative: after years of ,.,.smessmg Victor Zue read 
spectrograms. I have become most tmpressed with the richness of low level phonetic cues. 
5. The syllable de. is open because the vowel is tense (diphthongizcd): dep" is dosed 
because the vowel is lax 
6. lhe /p/ m -prtve is syllable inttml because it ts a.sptrated whereas the /p/ in dep" is 
s) liable final because it is unaspirated. 
93 
that it is) then it seems F~atural to propose a syllabic parser fi)r 
proccssit~g speech, by analogy with sentence parsers that have bccome 
standard practicc in d~e natural laoguagc community for processing 
.~ext. 
2. Parser Implementation and Feature Spreading 
A program has bcen implcmcntcd \[41 which parses a lattice of 
phonetic segmcnts into a lattice of syllables and other phonological 
constituents. Except for its novcl mechanism for handling features, it is 
very much like a standard chart parser (e.g.. Earley's Algorithm lTD. 
P, ccall that a chart parser takes as input a sentence and a context-free 
grammar and produces as output a chart like that below, indicating the 
starting point and ending point of each phrase in the input string. 
lnput~ Sentenc(l: 0 They t are 2 flying 3 planes 4 
Gram.mar: 
N "---* they V ---* are N --* tl¥ing 
A -"* flying V ---* flying N --~ planes 
S --* NP VP VP -..* V NP VP ---.~ V VP 
NP~ N NP~ APNP NP"-* VP AP -'* A 
('n,,.rt: 
o 
o(} 
i!1} 
2!{} 
I 2 3 # 
{Xt',N,they} {S} {S} {S} 
{ } {VP.V.are) {VP} (VP} 
{ } \[ } {NP.VP,AP,N.V,A,flying| {NP.VP} 
( } { } ( } {NP, N.planesl 
{} {} {} {} 
bLach entry in the chart represents the possible analyses of the input 
words between a start position (the row index) and a finish position (the 
column index). \[-'or example, the entry {NP, VP} in Chart(2,4) 
represents two alternative analyses of the words between 2 and 4: 
\[xp fi3ulg pia,esl add \[vp flying planesl. 
.the same parsing methods can be used to find syllable structure 
from an input transcription. 
lod)u\[ Sentence: O ~" £ t 2 S 3 l 4 Z 5 (this ~) 
Grammar: 
onset~ ~'\[SIZ peak---) i t\[ 
coda --.--) ~' \[ S I Z syl ----) (onset) peak (coda) 
Chart: 
0 J , H 
o{} 
t{} 
z{} 
st} 
4{} 
s(I 
I 2 3 4 .~ , 
{\[.onset.coda} {syl} {syl} { } { } 
{ } {!,pcak.syl} {syl) { } { } 
{ } { } {S.onset.codal (syl} {syl} 
{ } { } { } {l,peak.syl} {syl} 
{ } { } { } { } {Z, onset.coda) 
{} (} (I {} (} 
This chart shows that the input sentence can be decomposed into two 
syllables, one from 0 to 3 (this) and another one from 4 to 5 (is). 
Alternatively, the input sentence can be decomposed into \[~'t\]\[slzl. In 
this way. standard chart parsing techniques can be adopted to process 
allophonic and phonotactic constraints, if the constraints are 
reformulated in terms of a grammar. 
How can allophonic and phonotactic constraints be cast in terms 
of context-free rules? In many cases, the constraints can be carried over 
in a straightforward way. For example, the following set of roles 
express the aspiration constraint discussed above. These rules allow 
aspiration in syllable initial position (under the onset node), but not in 
syllable final position (under the coda). 
(lla) uttcrancc ---) syllable* 
(lib) syllable ~ (onset) peak (coda) 
(II.c) onset --* aspirated-t \[ aspirated-k I aspirated-p I.,. 
(lld) coda---, unrelcascd-t I unrclcased-k I unrcleased-p I-.- 
The aspiration constraint (as stated above) is relatively easy to cast in 
terms of context-free rules. Other allophonic and pho~aotactic processes 
may be more difficult. 7 
2..1 The Agreement Problem 
In particular, context-free roles are generally considered to be 
awkward for expressing agreement facts. For example, in order to 
express subject-verb agreement in "'pure" context-free rules, it is 
probably necessary to expand the rule S ~ NP VP into two cases: 
(12a) S ---* singular-NP singular-VP singular case 
(12b) S --) plural-NP plural-YP plural case 
7. For example, there may be a problem with constraintS that depend on rule ordering, 
since rule ordenng is not supported in the context-free formalism. This topic is discussed 
at length in I41. 
94 
The agreement problem also arises in phonology. Consider the 
example of homorganic nasal clusters (e.g., cam2II2, can't, sank), where 
the nasal agrees with the following obstruent in place of articulation. 
That is, the labial nasal /m/ is found before the labial stop /p/, the 
cor9nal nasal/n/ before the coronal stop/t/, and the velar nasal/7// 
before the velar stop/k/. This constraint, like subject-verb agreement. 
poses a problem for pure unaugmented context-free rules; it seems to 
be necessary to expand out each of the three cases: 
(13a) homorganic-nasal-cluster ~ labial-nasal labial-obstruent 
(13b) homorganie-nasal-cluster ~ coronal-nasal coronal-obstruent 
(13c) homorganic-nasal-cluster---* velar-nasal velar-obstruent 
In an effort to alleviate this expansion problem, many researchers have 
proposed augmentations of various sorts (e.g., ATN registers \[26\], LFG 
constraint equations \[16\], GPSG recta-rules till, local constraints \[18\], 
bit vectors \[6, 22\]). My own solution will be suggested after I have had 
a chance to describe the parser in further detail. 
2..2 A Parser Based on Matrix Operations 
This scction will show how the grammar can be implemented in 
terms of operations on binary matrices. Suppose that the chart is 
decomposed into a sum of binary matrices: 
(14) Chart = syl Msy I + onset Monse t + peak Mpeak + .,. 
where Msy I is a binary matrix 8 describing the location of syllables and 
Monse t is a binary matrix describing the location of onsets, and so forth. 
Each of these binary matrices has a I in position (i,j) if there is a 
constituent of the appropriate part of speech spanning from the i m 
position in the input sentence to the jth position.9 (See figure 3). 
Ph'rase-structure rules will be implemented with simple oper- 
ations on these binary matrices. For example, the homorganic rule (13) 
could be implemented as: 
8. Fhese matnccs will sometimes be called segmentatton lattices for historical reasons. 
Techmcally. these matnc~ need not conform to the restrictions of a lattice, and therefore, 
the weaker term graph L~ more correcL 
9 In a probabitisuc framework, one could replace all of the I's and 0's with probabdities. 
A high prohabdity m loeauon (i. j~ of the s),liable matnx would say that there probably is 
a ss'llahle from postuon t to position 1: a low probabdity would say that there probably 
isn't a syllable between i and 1. Most of the following apphcs to probabdity matrices 
welt as binary ntawices, though the probabdity matnces may be less sparse and 
consequently less efficient. 
Fig. 3. Msyl, Monse e and Mdtyme for: "O '~ I t Z s 3 I 4 z 5" 
001100 010000 000000 
001100 000000 001100 
000011 000100 000000 
000011 000001 000011 
000000 000000 000000 
000000 000000 000000 
The matrices tend to be very sparse (ahnost entirely full of 0's) because 
syllable grammars are highly constrained. In principle, there could be 
n 2 entries. However, it can be shown that e (the number of l's) is 
linearly related to n because syllables have finite length. In Church \[4\], 
I sharpen this result by arguing that e tends to be bounded by 4n as a 
consequence ofa phonotactic principle known as sonority. Many more 
edges will be ruled out by a number of other linguistic constraints 
mentioned above: voicing and place assimilation, aspiration, flapping. 
etc. In short, these mamces are sparse because allophonic and phono- 
tactic constraints are useful 
(15) (setq homorganic-nasal-lattice 
(M + (M* (phoneme-lattice #/m)labial-lattice) 
(M* (phoneme-lattice #/n) coronal-lattice) 
(M* (phoneme-lattice #/G) velar-lattice))) 
illustrating tile use of M + (matrix additit)n) ttt express the uniun of 
several alternatives and M* (matrix multiplication) to express the 
concatenation of subparts. It is well known that any finite-state 
grammar could be implemented in this way with just three matrix 
operations: M,, M+, and M** (transitive closure). If context-free 
power were required, Valient's algorithm \[25\] could be employed. 
However, since there doesn't seem to be a need tbr additional 
generative capacity in speech applications, the system is restricted to 
handle only the simpler finite state case. 1° 
2..3 Feature Manipulation 
Although "pure" unaugmented finite state grammars may be 
adequate fur speech applications (in the weak generative capacity 
sense), \[ may, nevertheless, wish to introduce additional mechanism in 
order to account for agreement facts in a natural way. As discussed 
above, the formulation of the homorganic rule in (15) is unattractive 
because it splits the rule into three cases, one for each place of 
articulation. It would be preferable to state the agreement constraint 
just once, by defining a homorganic nasal cluster to be a nasal cluster 
\]0. I personally hold a much more controversial posution, that tinite state grammars are 
sufficient for most. if not nil, natural language )-asks \[3\]. 
95 
subject to phlcc assimilation. In my language of matrix operations, I 
can say just exactly that: 
(16) (setq homorganic-na~l-cluster-lattice 
(M& nasal-cluster-lattice 
place-assimilation)) 
where M& (element-wise intersection) implements the subject to 
constraint. Nasal-cluster and place-assimilation are defined as: 
(17a) (setq nasal-cluster-lattice 
(M. nasal-lattice obstruent-lattice)) 
(17b) (setq place-assimilation-lattice 
(M + (M** labial-lattice) 
(M" dental-lattice) 
(M'" velar-lattice))) 
In this way. M& seems to be an attractive solution to the agreement 
problem. 
In addition, M& might also shed some light on co-articulation, 
another problem of'feature spreading'. Co-articulation (articulation of 
multiple phonemes at the same time) makes it extremely difficult 
(perhaps impossible) to segment the speech waveform into phoneme- 
co-articulation, Fujimura su~csts that place, manner and other 
articulatory features be thought of as asynchronous processes, which 
have a certain amotmt of freedom to overlap in time. 
(tSa) "Speech is commonly viewed as the result of concatenating 
phonetic segments. In most discussions of the temporal 
structure of speech, a segment in such a model is assumed to 
represent a phoneme-sized phonetic unit. which possesses an 
inherent \[invariantj target value in terms of articulation or 
acoustic manifestation. Any deviation from such an 
interpretation of observed phenomena requires special 
attention ... \[Biased on some preliminary results of X-ray 
microbeam studies \[which associate lip, tongue and jaw 
movements with phonetic events in the utteranceJ, it will be 
suggested that understanding articulator'/ processes, which are 
inherently multi-dimensional \[and (more or less) asynchrouousl, 
may be essential for a successful description of temporal 
structures of speech." \[9 p. 66\] 
In light of Fujimura's suggestion, I might re-interpret my parser as a 
highly parallel feature-based asynchronous architecture. For example. 
the parser can process homorganic nasal clusters by processing place 
and manner phrases in parallel, and then synchronizing the results at 
the coda node with M&. That is, (17a) can be computed in parallel with 
(17b). mid then the rcsulLs are aligned whcn the coda is computed with 
(16), as illustrated below for the word tent. Imagine that the front end 
produces the following analysis: 
(19) t a n t 
dental: I-I I ..... 
vowel: I-..I 
stop: I.I I ..... I 
nasalization: I..I 
where many of the ~atures overlap m an asynchronous way. The 
parser will correctly locate the coda by intersecting the nasal cluster 
lattice (computed with (17a)) with the homorganic lattice (computed 
with (17b)). 
(20) t a n t 
nasal cluster: I ....... J 
homonganJc: I ..... I 
coda: I ..... I 
This parser is a bold departure from a standard practice in two respects: 
(1) the input stream is feature-based rather than segmental, and (2) the 
output parse is a heterarchy of overlapping constituents (e.g., place and 
manner phrases) as opposed to a list of hierarchical parse-trees. \[ find 
these two modifications most exciting and worthy of further 
investigation. 
In summary, two points have been made. \[:irst. I suggested the 
use of parsing techniques at the segmental/feature level in speech 
applications. Secondly, I introduced M& as a possible solution to the 
agreement/co-articulation problem. 
3. Ack,mwledgements 
l have received a considerable amount of help and support over 
the course of this project. Let me mention just a few of the people that 
I should thank: Jon Allen, Glenn Burke, Francine Chen, Scott Cyphers, 
Sarah I-ergt,son..,'vlargaret Fleck, Dan Huttenlocher, Jay Kcyser, Lori 
LameL Ramesh Patil. Janet Pierrehumbert, Dave Shipman, Pete 
Szolovits. Meg Withgott and Victor Zue. 
References 
1. Bamwell, T., An Algorithm for Segment Durations in a 
Reading Machine Context, unpublished doctoral dis- 
sertation, department of Electrical Engineering and 
Computer Science, M1T. 1970. 
L Chomsky. N. and Halle, M., The Sound Pattern of~'nglish, 
Harper & R.ow, 1968. 
3. Church, K., On Memoo' Limitations in Natural Language 
Processing, MS Thesis, MIT, Mr\['/I,CS/TR-245, 1980 
(also available from Indiana University Linguistics Club). 
96 
4. Church, K., Phrase-Structure l'arsing: A Method lbr Taking 
Advantage of Allophonic Constraints, unpublished 
doctoral dissertation, department of I-',lectrical Engineering 
and Computer Science, MIT, 1983 (also to appear, I.CS 
and RLE publications, MIT). 
5. Cole, R., and Jakimik, J., A Model of Speech Perception, in 
R. Cole (ed.). Perception and l'roduction of Fluent Speech, 
Lawrence Erlbaum, HiIlsdale, N.J., 1980. 
6. Dostert. B., and Thompson, F., How Features Resolve 
Syntactic Ambiguity, in Proceedings of the Symposium on 
Information Storage and Retrieval, Minker. J., and 
Rosenfeld, S. (¢d.), 1971. 
7. Farley, J., An Efficient Context-Free Parsing Algorithm, 
CACM, 13:2, February, 1970. 
8. Fry, D., Duration and Intensity as Physical Correlates of 
Linguistic Stress, JASA 17:4, 1955, (reprinted in Lehiste 
(ed.), Readings in Acoustic l'honetics, MIT Press, 1967.) 
9. Fujimura, O., Temporal Organization of Articulatory Move- 
ments as Multidimensional Phrasal Structure, Phonetica, 
33: pp. 66-83, 1981. 
10. l-'ujimura, O., and Lovins. J., Syllables as Concatenative 
Phonetic UralS, Indiana University Linguistics Club, 1982. 
11. Gazdar, G., Phrase Structure Grammar, in P. Jacobson and 
G. Pullum (eds.), The Nature of Syntactic Representation, 
D. Rcidet, Dordrecht, in press, 1982. 
12..Heffner, R., General Phonetics, The University of 
Wisconsin Press, 1960. 
13. Kahn, D., Syllable-Based (ieneralizations ht lOtglish Pho- 
nology,, Indiana University Linguistics Club, 1976. 
14. Kiparsky, P., Remarks on the Metrical Structure of the Syl" 
lable, in W. Dressier (ed.) Phonologica 1980. Proceedings 
of the Fourth International Phonology Meeting 1981. 
15. Kiparsky, P., Metrical Structure, Assignments in Cyclic, 
Linguistic Inquiry, 10, pp. 421-441, 1979. 
16. Kaplan, R. and Bresnan, J., LexicabFunctional Grammar: 
A Formal System for Grammatical Representation, in 
Bresnan (ed.), The Mental Representation of Grammatical 
Relations, MIT Press. 1982. 
17. Jetinek, F., course notes, MIT, 1982. 
18. Joshi, A., and Levy, L.. Phrase Structure Trees Bear More 
Fruit Than You Would Have Thought, AJCL, 8: I, \[982. 
19. Klatt, D., Word Verification in a Speech Understanding 
System, in P,. R, eddy (ed.), Speech Recognition, Invited 
Papers Presented at the 1974 \[EEE Symposium, Academic 
Press, pp. 321-344, 1974. 
20. Klatt, D., Review of the ARPA Speech Understanding 
Project, JASA, 62:6, December 1977. 
ZI. Klatt, D., Scriber and Lal's: Two New Approaches to Speech 
Analysis, chapter 25 in W. Lea, Trends in Speech Recog. 
ration, Prentice-Hall, 1980. 
22. Martin, W., Church, K., and Patil, R., Prelhninary Analysis 
of a Breadth-First Parsing Algorithm: Theoretical attd Ex" 
permwntal Results, MI'I'/LCS/'I'R-261, 1981 (also to 
appear in I..Bolc (ed.), Natural language Parsing 
Systems, Macmillan, \[.ondon). 
23. Reddv R., Speech Recognition by Machine: A Review, 
Proceedings of the IEEE, pp. 501-531, April 1976, 
~. Smith, A., Word flypothesization in the Ilearsay-ll Speech 
System, Proc. IEEE Int, Conf. ASSP, pp. 549-552, 1976. 
25. Valient, l.., General Context Free Recognition in Less Than 
Cubic Time, J. Computer and System Sciences 10, pp. 308- 
315, 1975. 
26. Woods, W., Transition Network Grammars for Natural 
Language Analysis, CACM, 13:10, 1970. 
Z7. Zue, V., and Shattuck-Hufnagel, S., When is a ,/Ts/not a 
/3V?, ASA, Atlanta, 1980. 
97 
