American Journal of Computational Linguistics 
~i crof i che 60 
PITCH CONTOUR GEIIERATION 
IN SPEECH SYNTHESIS 
A JUNCTION GRAMMAR APPROACH 
ALAN K. MELBY, WILLIAM 3. STRONG, 
ELDON G. LYTLE, AND RONALD MILLETT 
Translation Sciences Institute 
130 B-34 
Brigham Young University 
Provo, Utah 84602 
Copyright @ 1977 
~ssociation for Computational ~unguistics 
SUMMARY 
Computer based text synthesis systems require a means for 
generating sentence-level pitch contours- These contours must have a 
kertain degree of "human fidelity" if the synthetic speech is to sound 
natural and not too machine-like. 
The pitch contowrs in cutrently - 
operational text synthesis systems are still not perfectly natural- 
sounding and thus computer generation of pitch contours is a topic of 
current interest. The introduction includes a survey of current work in 
this area by researchers at MIT, Bell Labs, Stanford, etc., descrfbing 
their general approaches. 
The research described in thiq paper uses Junction Gramnar as a 
theoretical base, and Linear Predictor Coefficient (LPC) methods as aq 
analysis-synthesis technique. Motivations for these decisions are presentee 
S&ction I begins with an explanation of some sentences which 
are being studied. 'For example, there is likely a stress on "study" in 
the sentence  he boys who study get good grades," if the context is "but 
the boys who don't get bad grades." On the other had, if the context is 
"but the girls who study get poor grades," then there is probably stress 
on "boys." The various readings of "the boys who study.. . " and other 
sentences are explained within the 3ut1ct;Son Grammar framework. An over- 
view is given of a system for generating pitch contours for a sentence 
from a Junction Grammar semantico-syntactic representation. 
Section I also in-ludes a description of an extension of 
Junction Grammar whi& defines an object called an articulation tree, 
correspondtng to each junction tree. A junction tree contains semantico- 
syntactic information but no lexical information. An articulation tree 
3 
contains segmerital information about each lexical item and suprasegmental 
or prosod lc informatiofi combining the lexical items ihto prosodic units. 
Semantic distinctions in junction trees are recoded as distinctions in 
the prosodic structure of articulation trees and then articulation Vrees 
are used to generate pitch contours. Junction trees and articulation 
trees are included as figures for several sentences. 
Sectqon I1 describes-how pitch contours are generated, including 
the recoding of junction trees as articulation trees, the assignment of 
~nitial and final pitch levels and pitch at nuclear syllables, apd how 
the generated contours are combined with analysis parameters and synthe- 
sized into speech. It should be noted that the junctlon trees are entered 
manually rather than by automatic analysis, in the cur~ent implementation. 
The te*t includes several graphs of natural pltch contours as 
well g~ contours generated by the computer system. 
The pitch contour system produces a synthesis output foL each 
reading of a sentence. Thirty-five sentences, some with natural, some 
with hand-drawn, and some with machhe-generated pitch contours were 
evaluated for naturalness and "intelligibility" of intonation in four 
types of tests. Results of testing several subjects showed that the 
generated pitch contours were judged nearly as natural as hwnan-produced 
contours, and except for some specific problems involving duration, the 
generated contours were intelligible in the sense of causing the listener 
to perceive the intended reading of the sentence. The text lncludes a 
quantitative summary of the results of the evaluation. 
For the corpus of sentences treated so far, Junction Grammar 
provides a satisfactory theoretical base for generating pitch contours 
and defines some specific cases where pitch alone is insufficient to 
make distinctions and must be used with duration, pause and intensity. 
Appendices: 
A. Suggested background reading in acoustic speech processing 
and Junction Grammar. 
8. Glossary of terms, e.g. LPC, FO, Hertq etc. 
C. Description of the computer implkmentation (on a PDP-15 
with a VT-15 grapnics display unit). 
D. More details on the evaluation procedure. 
For the convenience of the reader, a recent paper on Junction 
Theory presented at a BYU Linguistics Symposium is reprinted at the end 
of the microfiche. 
TABLE OF CONTENTS 
.... 
INTRODUCTION AND SURVEY OF RESEARCH IN PITCH CONTOURS 6 
Section 
........................ . I THEORY 13 
11 . METHOD ........................ 31 
. 111 EVALUATION AND DISCUSSION .............. 
41 
......................... REFERENCES 4Y 
APPENDICES 
A . BACKGROUND READING ................. 
52 
D . MORE DETAILS ON THE EVALUATION ........... 
58 
REPRINT OF ONE OF THE REFERENCES (Lytle. 1976) ....... 
69 
INTRODUCTION 
All computer based text synthesis systems require a means for 
generatips sentence-level pitch contours. These contours mvst have a 
certain degree of "human fidelity" if the synthetic speech is to sound 
natural, that is, not too machine-like. The pitch contours in currently 
operational text synthesis systeqs are still not perfectly natural- 
sounding and thus computer generation of pitch contours is a topic of 
current intel'est. This intere,st is shown, for example, by Allen as he 
discusses pause and duration in text synthesis and then goes~on to say: 
If temporal control presents great problems in the 
description of speech, then the problems of fundamental 
frequency $fO), or pitch control, are at least as 
difficult, Once again, problems arise due to the fact 
that the £0 is correlated with many factors, including 
vowel tongue height, previous consonant, breath group 
contour, syntactic and semantic content of words, 
whether a sentence is a questfon, intonation effects, 
and word boundary glottalization. 
(Allen, 1976: 440) 
Given the need for further research in pitch control, a 
question remains of how to approach the problem. The authors feel it 
is important to work within a linguistic model that interrelates 
semantic and phonetic phenomena. Later on in Allen's article he makes 
the following statement (which coincides with our philosophy): 
The current use of sophisticated means for pitch 
recording, coupled wf th increased interaction between 
linguistics and speech resesrchers, should, however, 
lead to significantly improved pitch control programs 
which are based on sound linguistically motivated 
theory. 
(Allen, 1976: 441) 
The need for interation between linguistics and speech 
research is further explained by Umeda (1976: 450): 
The message realization forms one structure as a whole. 
Its constituents-acoustic realization, higher level 
prosody, and syntax-semantics-interact ~ith each other 
very closely; a decision niade at any level derives 
immediately from the obtained result at the level 
above, and afferts 2 decisgon at the level below. 
The remainder of this section consists of a survey of some of 
the current work in this area in the USA (at MIT, Bell Labs, and 
Stanford University), in Germany, and in the USSR. Then the section 
will conclude with an introduction to the present research. 
A. MIT 
At MIT, Allen (1976) is working on pitch control as an element 
in his overall plan to produce a system capable of producing synthetic 
sp,eech from unrestricted English text. He points oqt that although a 
syntactic and semantic analysis is needed, nb existing automatic 
algorithm can provide that analysis reliably for entire sentences of 
unrestricted text. So he has elected to do a local analysis of the 
sentence first and then tie together the local analyses into a sentence 
level analysis if possible. The analyzer is thus designed so that if at 
some point complete sentence analysis is blocked, the partial analyses 
are still useful in generating the pitch contour and other prosodic 
controls such as duration and pause. In response to toe need for a 
theoretical framework for relating a text and its pitch contour, Allen 
is using the ideas of Halllday (1970) (e.g. discourse focus) to 
,&~vestigate such questions as when and why elements of a verb string 
are stressed. For example, he no'tes that the sentence "A farmer was 
eating the carrot1' will receive emphasie on "eating" if'it is in response 
to a questJon about what the farmer is doing. Allen currectky notes that: 
The discovery and coordination of all these effects 1s a 
large and continuing effort, and it is clear that 
substantial setnantic and discourse-level knowledge is 
needad to correctly predict prosodic parameters." 
(Allen, 1976: 441) 
B. Bell Labs 
Several workers at Bell Labs have attacked the problem of 
:ontrolling pitch in speech synthesis, Olive (1975) describes a system 
€or generagivg pitch contcrulrs fsr the sentence type "article-subject- 
verb-article-object" with an optioqal adjective on the subject or 
object. TTis method for: generating the pitch contour was to record several 
sentences of the specified type using random words and to average the 
natural pi'tch contours to obtaiq prototype contours. Then the contour 
far each word was approximated by a fourth ~rder polynomial to "facilitate 
linear stretching and compressfon of the fundamental frequency contour. 
11 
Oli3e reports that by ushg this pitch contour generation system, <n 
c~njuncfion with a word concatenation schesrie in which the words are 
stored in linear predictor coefficient (LPc) code, the synchesized sen- 
tences were of high quallty 
Umeda, at Bell Labs, is also concefned with pitch contours, 
11 
8sserting that Among acoustic companents, pitch (the fundamental 
frequency of the voice) shows the a~st direct relation to higher level 
prosody, stress and boundaries" (Umeda, 197'6: 448). Umeda's algorithm 
for controlling prosodic paldiueters is based on n syntactic analysis of 
the input text. The analyzer fits each clause into a $emplate consisting 
of the following optioaal slots: sentence modifier, subject, verb, object 
or complement, rail modifier, and punctuation mark. A poPnt where the 
above order of template elements is violated is marked as a boundary, 
and bvundaries are later used to as~ign pauses and intonation (Umeda, 
1975). 
C. Stanford University 
At Stanford University, there is a research project on generaave 
pl-osodics ir the Jsstitute for Mathematical Studies in the Social 
Sciences (IMSSS). Researchers on this project are developing a system 
which, ultimately, is intended to do synthesis in real time for use in 
computer-assisted instruction at IMSSS (Le~ine, 1976). Their technique 
is to compile a lexicon of words in LPC code (Atal and Eanauer, 1971) 
and then, when a given sentence is to be synthesized, concatenate the 
code for each word, adjuqting duzati~ns and pitch contours as needed. 
Whife Olive throws away the original pitch contour of each word, the 
IMSSS approach is to adjust the original contour of the ward and then 
further smooth the contour so that each word will not sound sentence 
final , 
The IMSSS group uses the ideas of Leben (1976), who relates 
hglish prosody to tone languages in that he views both tone languages 
and English as having a suprasegmental melody which is combined with 
the segmental phonolbgical elements. The IMSSS group (Levine, 1976: 3) 
defines melody as a sequence of "auto segmental tones (autonomous from 
t-he phonological segments) selected from the tonal repertoire of the 
language." 
These tones are treated theoretically as discrete fundamental 
frequency levels, but then they are realized phonetically as continu~us- 
contour&. In order tb assign tones to key syllables, a program analyzes 
the sentefice to be synthestzed using a simple phrase structure grammar 
which brackets  phrase^, clauses and other complex constituents, and 
indicates boundarierj between maj or constituents . 
D. Germany 
Complementary to pitch contour generation, is the study of the 
perception of pitch contours. 
In Germany Isacenko and Schadlich (1970), performed an interest- 
ing series of experiments on the perception of German intonation. 
Natural sentences illustrating different intonation patterns were recor- 
ded and monotonised at various fundamental frequencies (e.g. 150 Hertz 
and 178.6 Hertz). Then the tapes of the monotone versions were cut and 
spliced at various points. The spliced tapes thus had an artificially 
simplified intonation of exactly two tone levels. The team found that 
they could change the way listeners perceived certain ambiguous sentences 
by changing only the points at which tone switches occurred. 
E. USSR 
In the USSR, fiaavel et! al. (1976) have also performed some 
experiments in manip~lating pitch contours while leaving other parameters 
constant. They are interested in finding ways to "decrease the amount 
of in£ ormation necessary for the description of pitch curves without 
distorting the parameters interpreted by man as prosodic characteristics 
of a sentence." They base this search on the assumption that man has only 
a limited short term memory available for storing the pitch contour and 
so makes decisions concerning the prosody of a sentence by extracting 
prosodic features which contain considerably less information than that 
needed to reconstruct exactly the same pitch contour. They conclude 
from these experiments that decisions such as declarative versus 
interrogative are based on the position of the rise or fall in pitch 
and not on the difference in pitch from high to low. They also conclude 
that in determining emphasis, 
the position of the peak value of the 
second derivative of the pitch contour is very significant. 
F. Brigham Young University (BYU) 
The research in pitch contour generation to be described in 
this paper addresses basically the same questions as the various projects 
surveyed above: 
(1) What theoretical base might one use to represent syntactik and 
semantic information? 
(2) How does one convert linguistic information, both et sentence- 
level and discourse-level, to the algorithmic control of 
prosodic parameters? 
(3) What aspects of the pitch contour (e.g. 1st and 2nd derivatives, 
transitions relative to key syllables, and actual frequenty) 
are significant in causing intonation and emphasis options to 
be perceived? 
(4) What synthesis technique should be used to incorporate the 
prosodic controls into a working system (e.g, LPC synthesis, 
formant synthesis, or articulatory synthesis)? 
We have chosen to use Junction Grammar (JG) as a theoretical 
framework within which to look fbL answers to questions (1) and (2) 
absve. Junction Grammar refers to a linguistic model formulated by 
Lytle (1974)- Subsequently, Junction Thesry has been used to formulate 
a new theory of phonology in wfiich a semantico-syntactic representation 
(called a junction-tree) is recoded as a general articulatory represen- 
tation (called an articulation-tree) (Lytle, 1976). Junction Grammar 
extended to include Junction Phonology was selected for use in the BYU 
project because it seems to provide some significant insights and a 
flexible framework for our research. 
It should be pointed out that at present there is no completely 
automatic algorft5m for obtaining a detailed and powerful representation 
of syntax-semantics from general English text. For this reason, other 
researchers (e.g., Allen at MIT, Umeda at Bell Labs, and Levine at 
Stanford) have chosen to use a simple representation which can be 
obtained automatically. The authors' research, however, takes advantage 
of a larger project (Lytle, 1975) which uses - man-machine --- interaction 
to obtain a more powerful representation than can be obtained automati- 
cally. Therefore, it was decided to use the full power of Junction 
Grammar repres-entations in hopes of a future automatic analyzer rather 
than use some 'restricted version of Junction Grammar and be forced to 
add to it piece by piece to accoufit for more and more phenomena. 
To gain insight into topic (3) above (concerning which aspects 
of the pitch contour are significant t~ perception), we experimented 
with manually specified pitch contours. 
In answer to question (4) above (concerning the choice of an 
analysis synthesis technique), we have chosen to work initially with an 
LPC synthesis technique (as did Olive at Ball Labs and Levine at 
Stanford) because an LPC software package was already available at BYU. 
But long range plans include the use of an articulatory functional 
model (Flanagan, 1975). 
I* THEORY 
We now turn our attemioa to certain linguistic phenomena 
which we consider especially interesttng. First, we will illustrate 
the phenomena with sample sentences which will be discussed in 
intuitive terms and then in terms of Junction Grammar junction-trees 
(J-trees) and articulation-trees (A-trees). 'Rle section will conclude 
with a block dLagram of what a fully developed Junction Grammar text 
synthests system would look like and a block diagram of the system as 
currently iaplemented. 
A. Intuitive Presentation of Some Test Sentences 
v 1 
Consider the sefitence "John drove to the store. This sentence 
can be read several different ways depending on the discourse context, 
Figure 1 shows five possible readings and their context. Whatever system 
is used to represent the linguistics of this sentence, it should be 
possible to represent each of these four readings uniquely. 
Sentence Possible context 
la John drove to the store. What happened? 
Ib John drove to the store. Who drove to the store? 
lc John drove to the store. Row did John pet to the store. 
Id John drove to the store. Where did John drive? 
ie John drove to the store? John drove to the -,tore, you know. 
(Are you sure that's what 
you meant to say?) 
Figure 1. John drove to the store. 
Now consider the question "Did John or Mary come?" Suppose 
that you heard someone come in but you did not see who it was. 
Nevertheless, you are sure that it was either John or Mary, In this 
context, you would put stress on "John" and on "Mary" and a falling 
pitch at the end of the sentencze. Then you would expect a reply of 
"John" or "Mary." (If you receive as a reply simply "yes" then the 
person responding either did not understand or is trying to be funny.) 
On the other hand, suppose a whole crowd came to a party and you have 
a message which you must deltver to either John or Mary. In this contzxt, 
you may or may not stress "John" and "Mary" but you would certainly end 
the sentence with a rising pitch. Then you would expect a ~es/no reply, 
or perhaps a yes/no with additional volunteered information such as 
"Yes, John is over there in the corner .I1 Again, we would like our 
system of representation to handle this distinction. The two readings 
of "Did John or Mary come?" are summarized In Figure 2. 
Sentence 
2a Dld John or Mary come? 
(falling pitch at end) 
2b Did John or Mary come? 
Possible Response 
John came, 
Yes, they are both here. 
( rising pitch at end) 
Figure 2. Did John or Mary come? 
Finally, consider the sentence "The boys who study get good 
grades." Idhat difference in meaning is there 1n stressing "study" as 
opposed to stressiqg "boys"7 The difference can be illustrated by 
expanding the sentence to "The boys who study get good grades but the 
others do not ." If "study" is stressed, "others" is interpreted as 
"boys", namely the boys who do not study. If, however, "boys" is 
stressed, "others" may no longer be interpreted as "boys," but it can 
be interpreted as "girlst' or '"men who study" or some other group 
of 
students in contrast with boys. Once again, our system of representation 
needs to handle this distinction, and handle it in a way conststent 
with the treatment of other distinctions. Three readings of this sen- 
tetlce are summarized in Figure 3. 
Sentence Possible continuation 
3a The boys who squdy get good 
grades. . . as is usually the case. 
(neutral) 
3b The boys who study get good 
grades, . . but the boys who spend all thei: 
time playing basketball get poor 
grades. 
3c The boys trho study get good 
grades. . . but for some reason the girls 
(even the girls who study) get 
poor grades. 
Figure 3. The boys whd study get good grades 
B. Junction Grammar Representations of the Same Sentences 
We now discuss how Junction Grammar represents the above 
distinctions in its representations. If the reader is not as yet 
familiar with Junction Grammar, it might be advisable to consult 
Apperidix A before roeading this section. As indicated therein, some 
recent refinements of Junction Grammar are not yet available in 
published form. We therefore briefly discuss two of them here. One is 
the specLalizations of subjunction in J-trees, and the other is the 
explicit representation of modalizers. 
Directan of Subjunction First consider the three major 
specializations of subjunction shown in Figure 4. 
Specializations of DIRECTION: 
symbol mnemonic 
* r 
right 
r * 
left 
-*. double 
Indication of WINDER: 
hyphen 
equals 
function 
entry of information 
recovery of information 
non-restrictive association 
induces a remainder 
induces no remainder 
Figure 4, Specializations of Subjunction in J-trees 
A right sub junc tion (* 0) of ten signifies that information is 
to be entered into the hearer's memory net. For example, when we read 
the sentence "I saw a lost child with a scraped knee this morning, and 
1 helped him find his mother," we enter (according to Junction theory) 
into our memory a slot for a child who wqs lost. The junction between 
"a" and "child" would be N ("a") *- N ("child") , If we next read the 
sentence, "The child had been crying for two hours, the poor thing," we 
would recover the slot for the child and add to it the information that 
he had been crying. The junction between "the" and "child" in this case 
would be N ("the") ** N ("child"). The third type of subjunction (**.) 
woilld be used, for example, in the sentence "John, our mailman, is 
going to retire m March," to show that t'John,'t and "our mailman" are 
17 
defining the same person independently (cf* the traditional restrictive 
non-restrictive distinction) . 
In the above examples, we considered full subjunctions, (e.g. 
"John, our mailman") but the same specializations apply to interjunctfons, 
(e.g. "John, xho is our mailman"). In a normal, restrictive modification, 
a left subjunction is used. For example in, "Please give The the yellow 
book on the second shelf," "yellow'1 and "book" would be joined as 
follows (Pig. 5) . 
(intersect 
blo o k node) I 
yellow 
Figure 5. J-tree for "yellow book" 
For an explanation of the various nodes in this representation 
for a simple phrase see Lytle (1975). 
In the sentence "Of Tom, John and Rudolph, - John drove to the 
store,'' the prepositional phrase "of Tom, John and Rudolph" does not 
restrict the meaning of "John" in the way "yellow" restricted "book" in 
the previous example. Actually in this case, "John" restricts the scope 
of the prepositional phrase. As -a reflection of this, the prepositional 
phrase is intwjoined with "John" using a sight subjunction as illustrated 
in Figure 6. 
N **N 
John 
P 
of ")y 
N N N 
Tom John Rudolph 
Figure 6. Right interjunction 
We call this an example of Frame I1 modificqtion because the 
1 I 
right subjunction is relating John" to a second frame of reference (i.e. 
Tom, John and Rudolph). On the other hand, "yellow book'' is a hame I 
modification because it restricts "book" within its own frame of reference 
(i.e. it determines whi-ch book we are talking about). 
Remainder. The second type of specialization mentioned in 
Figure 4 is an indication of remainder. The concept of remainder (Lytle, 
1974) is concerned with whether all or only part of a set is referred to. 
If one desires to indicate whether there is a remainder in a subjunction, 
be simply. replaces the dot with either a hyphen or an equals sfgn. 
The Hyphen option. For example, from the sentence "Please give 
fl 
me the yellow book on the second shelf, we must assume that there are 
books of some color other than yellow on the second shelf. These other 
colored books are the remainder and we could diagram "yellow book" more 
specifically than before as follows (Figure 7). 
book 1 
A 
yellow 
Figure 7. Lefr Hyphen 
The Equals option. One common case of the equals optibn is for 
explicit modalizers (e. g. artiCles) . For example, the phrase "The child" 
could be diagrammed as follows (Figure 8), 
the child 
Figure 8. Explicit modalizer . 
The identity of "child" is retrieved and placed in the article 
"the1', filling it entirely and leaving no remainder. However, for our 
purposes, we will leave the modalizers implicit and simply-use N (the) cat. 
Thls brief discussion of specialized sub3unction and modalizers 
will suffice for us to reexamine the three sample sentences presented 
at the beginning of the chapter, but this time in terms of J-trees and 
A-trees. 
"John drove to the Store." Figure 9 shows the J-tree and A-tree 
for the neutral reading of "John drove to the store" (santence la of 
Figure 1). The J-tree (a semantico-syntactic representation) is consistent 
with the version of Junction Grammar described by Lytle (1975). The A- 
tree (a phonological representation) is eonsistent with Junction Phonology 
(Lytle, 1976), except that the internal structure of the V3 nodes is not 
shown. This A-tree specifies that the sentence is to be pronounced in two 
units "John" and "drove to the store1', and "drove to the store" is further 
divided into ' "drove" and "to the store. I' The sub junctions numbered 1 and 
2 indicate the relations between the sub-ph~ases. In an articulation 
tree, a left subjunction between H constituents indicates that the right 
to (the) store 
H .* V3 H 
John 
drove to the store 
L 
drove to the store 
Simplified A-tree 
Figure 9. "John drove to the store" Version la 
operand is pr~sodically subordinate to the left operand. As for the pitch 
confour, a left subjunction causes a dbwnward pitch shift. Similarly, a 
right subjunction causes an upward shift. The extra subjunction at the 
top of the A-tree is available for adding prosodic feature specifications 
relevant to the entire sentence. The A-tree system of representation is very 
flexible and a different A~tree could be used if it were decided to group 
the elements of the sentence differently. At the bottom of Figure 9 is 
a simplified version of the A-tree, which is used throughout the rest of 
this paper to make the trees easier to read. But it should be noted that 
the computer implementation uses the trees in their full form. 
Having described the J-tree and A-tree for the neutral form of 
"John drove to the store," we now consider how the trees differ for the 
four other versions shown in Figure 1. In versions b, c and d we stress 
"John," "drove" and "to the storett respectively. This stregs is the 
reflectton of an implicit frame I1 modifier in the J-tree (see Figure 
10). For example, according to Junction theory, when the context is "lho 
drove to the store?", "~ohn" is implicitly modified by a right interjunc- 
tion which indicates that John has been selected out of a set of 
possibilities. A possible explicit frame I1 modifier would be: 
"Of the persons who ~ight have gone to the store, John drove 
to the sto-e. 
At this point, it is worrh Ciscussing a very general relationship 
that has been observed between J-trees and English prosodic stress 
(Figure 11) : 
(1) In a full s~bjunction, any time a remainder is induced 
(i.e. by *- or -*) in an operand, the other operand 
receives a stress (e.g= - two *- boys). 
(Continued on page 23.) 
PV + $4 (Frame 11) 
I 
John 
v 
arove 
store 
drove 
I 
John 
(Frame 11) V '-v-, 
drove 
A 
P f N (Frame 11) 
PV + N 
I 
John 
to (the) 
store 
P + N 
to (the) 
store 
PV + %f 
I John 
drove 
store 
J-trees 
Figure 10. "John drove to the store" Versions lb - le 
(2) In an interjunction, any right interjunction causes 
a stress on the primary operand, and a left hyphen 
subjunction causes a stress on the V3 of the subor- 
dinate part of the interjunction to which the topic 
is joined as an emclitic. 
Figure 11. J-trees and English prosodic stress 
In the case of the sentence at hand, the implfcit frame 11 
modifier, being a right interjunction, causes the primary operand, that 
is, the element to which the Frame I1 feature is applied, to be stressed. 
Thus we have accounted for the th'ree stressed versions of "John 
drove to the store." The interrogative version (version le of Figure 1) 
has a [+ verify] feature on the top of the J-tree. That is, the listener 
is asking for verification of what was said. This feature is retofded 
as a prosodic [+ verify] feature in the A-tree. Figure 12 shows the 
k-trees for these five versions. 
Having covered this first example in detail, let us examine the 
two other sample sentences in a more abbreviated fashion. 
 id John or Mary Come?" Figure 13 shows the J-tree and A-tree 
for each version of "Did John or Mary come?". As seen in these figures, 
the semantico-syntactic difference between the two versions is where the 
interrogative is placed, on the whole sentence or on the conjoined subject. 
The prosodic difference is that in version 2a, "John" and 'Wary" are 
stressed (stimulated by the interrogation on the OR junction), while 
John 
- 
(+stress) 
H*H 
drove to the 
store 
drove to th~ 
store 
- 
(+stress) 
John 
drove 
i_ 
to the 
(+stress) store 
John 
drove to the 
store 
Figure 12. 
"John drove to the store" Versions lb - le 
/\ 
(did) + ,Nq?l 
come John Mary 
2b J- tree 
SV (yes/ao?) 
A 
(did) PV + 
I A 
v N &or N 
come John Mary 
A 
come 
Ei .&. B 
Did or 
John Mary 
(stress) (stress) 
H (Sunfinished phrase) 
come 
H & K 
'Did or 
John 'Mary 
Figure 13. "Did John ar Mary come?" 
in version Zb, the A-tree is marked [unfinished] because of the [yes- 
no interrogative] feature on the J-tree. A "finished" version would be 
"Did John or Mary come or not?". 
"The'boys who study." Figure 14 shows J-trees and At-trees for 
1t 
the three versions of "The boys who study get good grades. The J-trees 
differ only in the type of subjunction #between "boys" and "who". In the 
A-tree, "boys" or "who study" is stressed according to the type of 
subjunction in the J-tree, following the rule stated above. This con- 
cludes our discussion of how Junction Grammar handles the three samp,le 
sentences presented at the beginning of the section. 
C. Text Synthesis Yodel 
We now consider a fully-developed JunctPon Grammar text 
synthesis system (Figure 15). This system incorporates the Ju~ction 
Grammar model of translation so that the input text might be in Spanish 
and the output in English. In this full system, J-trees adjusted 
(transfered) for the target language vould be needed as well as fully 
specified A-trees. The A-trees would include the internal structure of 
the V3 nodes, and the information in the A-tree would be converted into 
parameters that drive a functional analog of the vocal cords and tract. 
Clearly, putting together such a system would be a very ambitious project. 
A restricted version. At pkesent, we have implemented only a 
restrfcted version of the f6ll system, illustrated in Figure 16. In 
this system we have isolated the pitch contour from qther control parameters. 
Thus, we have chosen to work with an entire sentence as a unit. Essentially, 
3 a, b and c J-trees 
grades -PA 
I 
3 a, b and c A-trees 
\ 
study 
A 
\ 
good a (neutral): =* 
I b (study) : 
-* I 
c The boys .* who study 
(stress) 
A A 
H 
giHgit *,'A 
Figure 14. 
"The boys who study get good grader." 
a The boys .* who study 
b The boys *. who study 
(stress) 
good grades 
Input Text [written] 
Adjusted J-tree 
+ 
I Junctian Grammar Synthesis 
I 
Junction Grammar Transfer 
2 
A-iree &general articulatory) 
Articulatory Parametefs (articulatory) 
Parameter Generation 
L 
Model of Vocal Cords 
and Vocal Tract 
I 
\1/ 
Speech (acoustic) 
Figure 15. A fully-developed Junction Grammar Text Synthesis System 
-r 
LPC 
Analygis 
, 
word boundaries 
and nuclear- 
syllable 
locat tons 
1nf ormaf ion from Input Text 
r 
Junction 
Grammar 
Synthesis 
t 
I 
I 
Spoken Form syntax-semantics 
I pitch Contour 
LPC analysis 
parameters 
(except pitch) 
LPC Synthesis 
L 
Speech 
Pitch contour 
generation 
Figure 16. The currently implemented system. 
t 
we LPC-analyze the spoken input sentence, enter a J-tree for tb 
sentence, recode the J-tree as an A-tree, generate a pitch contour 
from the A-tree, replace the natural pitch contour with the generated 
one, and PC-synthesize to prqduce a spokkn output sentence. 
11. METHOD 
The model described in Section I provides a representation for 
the semantico-syntactic information underlying prosodic contrasts and a 
very flexible framework for representing phrasing and prosodic features 
at the general articulatory level. But we have not yet spedfied how a 
J-tree is recoded as an A-tree or how the pitch ,contour is actually 
obtained from the A-tree. This chapter will describe the computer 
algorithms that have been implemented td perform these two conversions. 
Of course, they should not be taken as any kind of fiml statement 
concerning the task as they are under continuing development. 
A. Recoding a J-tree as an A-tree 
The general form of the A-tree is obtained by traversing the 
J-tree according to the language specific order stored in the J-tree. 
At 
each node the algorithm decides whether or not to declare a phrase, thus 
allowing nested phrases. The criteria for declaring a phrase are: 
(1) The topmost node of the J-tree defines a phrase. 
(2) If the ptedicate consists of more than a single vexb and a single 
object, the verb and object will be made into a phrase which 
will then be joined to the subject. 
(3) The cantents of each subordinate tree of the J-tree (which is e 
forest of trees), is phrased under the dominating tree. 
(4) Each operand of a conjunction forms a phrase. 
The assignment of prosodic features to the A-tree (f .e. [+ stress] , 
[+ un~inished phrase] , and [+ verify contour] ) is fairly strsightf orward. 
The criteria for assigning [+ stress] to a node are: 
(1) A Frame XI feature in the J-tree, 
(2) A left or right hyphen sub j unction (indicating remainder), 
(3) The operands of an "OR" interrogative. 
Tne directionality of the subjunctions between n-constituents 
in the A-tree Is left except in the following situations: 
(1) There is a right subjunction between the A-tree phrases from 
a simple verb and its complex object in the J-tree, 
(2) If a phrase is marked 
[+ stress], the sub-phrases of the phrase 
are subordinated to it by adjusting the direetionalities of the 
sub junotions . 
B. Background of the A-tree to Pitch Contour Algorithm 
With this overview of the J-tree to A-tree conversion algorithm, 
we describe an algorithm to obtain a pitch contour from an A-tree. The 
evolutionary phases in the development of this algorithm were: 
Plots. 
We plotted pitch ahd intensity against time for various readings 
of several sentences. 
Manual Contours. In order to determine which aspects of the pitch contour 
are essential to natural-sounding synthesis, we programmed a system to 
allow manual specification of the pi teh contour with linear interpolation 
between specified points and to then pennit listening comparison of 
synthesis outputs with natural versus manual contours. 
First Algorithm. Based QXI these initial experiments, we programmed a 
simple pitch contour algorithm that imposed on each phrase a contour 
selected from a ffxed inventory of contours and algebraically added in a 
pitch "bubble" to the syllable of a prosodically stressed V3. In this 
initial system we were able to create multiple readings of sentences 
like "John drove 'to the store" from a single set of LPC analysis parame- 
ters, varying only the pitch contour. In other words, we concluded that 
although the perceptual phenomenon called prosodic or suprasegmental 
stress is well-known to be based on several acoustic parameters, including 
pitch (i.e. fundamental frequency), intensity and duration, in st least 
some cases, changing only the pitch contour is sufficient to cause a word 
to be perceived as stressed or not stressed. However, after considerable 
theoretical discussions, we decided to abandon the approach of using a 
fixed inventory of prototype contours and try a more dynamic approach, 
which we will now describe. 
C. Current A-tree to Pitch Contour Algorithm 
Given an A-tree and an option code to indicate initial and final 
values and bounds on parameters, the algorithm assigns an initial and 
final pitch basea on the option code. Then the A-tree is traversed in 
left-right order. Upon encountering each V3, we assign a pitch to the 
core of its nuczear syllable as follows: 
(1) The fixst-V3'receives the initial pitch of the sentence. 
(2) A left subjunction causes a ratio decrement (about 0.90) to the 
last assigned pitch. 
(3) A right subjunction causes a ratio increment (about 1,12) in 
relation to the last assYgned pifch. 
34 
(4) A conjunction causes no change to date, but further research 
ib needed. 
(5) An B-constituent domxpstlng multiple V3's rekeives the average 
of the most recently assigned pitch level and the highest pitch 
assigned to any of its operands. 
Then the contiaurs between nuclear syllables are defined as 
valleys whose depth increases with the distance in time between the 
nuclear syllables it joins. After the initial contour is defined, twc. 
types of contour adjhstments are added: 
(1) ~djustments in the pitch contbur caused by stop consonants. 
We call these stop discontAnuities because when the speech,waveform 
becomes voiced again after a stop, the pitch is significantly higher than 
when the stop began but soon settles down to a value which would be 
predicted by smooth interpolation ctf the pitch contour over the unvoiced 
segment. 
(2) The pitch "bubble" associated with a stressed V3. 
Although the above algorithm is not complete, it works 
reasonably well and does have one already mentioned aspect which we 
repeat here for emphasis : The,pitch contour is generated from the A-tree 
in a completely dynamic manner. That is, there is no fixed inventory of 
pitch levels or phrase contours. Each new pitch level is assigned relative 
to previous values assigned and in accordance with preassigned absolute 
pitch limits (egg. 60 Hz'and 200 Hz) and the overall structure of the 
A-tree. This means that, although we have so far restricted ourselves to 
carefully spoken speech, this system may have the flexibility to 
eventually allow synthesis of varying speech rates, i.e. very slow and 
careful or very fast and sloppy speech by appropriate option codes in the 
35 
J-tree to A-tree algorithm and the A-tree to pitch contour algorithm. 
D. Sample Pitch Contours 
To conclude this chapter we present some graphs of pitch contours 
for the sentence "The boys who study get good grades." Figure 17 shows a 
natural, a rule-generated and a manual pitch contour for sentence 3b 
("The boys who study get good grades"). Figure 18 shows a natural and a 
rule generated pitch contour for sentence 3c ("The boys who study get 
good grades"). Note that these two contours ar imposed on the same set 
of LPC analysis parameters to produce the two readings. Figure 18 also 
shows a rule generated and a natural contodr for "The cat that the dog 
chased got away. 
I1 
boys study 
With unvoiced segments left blank 
boys study 
With unvoiced segments filled in for easier comparison 
with rule-generated contours 
Figure 17a. Natural contour for sentence 3b ("The 
boys who study get good grades") 
Natural Pf tch Contour 
Rule-generated Pitch Contour 
Figure 17 b. Natural and rule-generated contours for sentence 3b 
Natural 
boys study 
Manual 
Figure 17c. Natural and Manual Contours for sentence 3b 
boys 
Natural 
Figure 18a. Natural and rule-generated contours for sentence 3c 
("The boys who study get good grades.") 
cat dog 
Natural 
cat 
Figure 18b. Natural and rule-generated contours for the sentence 
"The cat that the dog chased got away." 
111. EVALUATION AFD DISCUSSION 
We produced a demonstration tape of LPC synthesized speech using 
natural, monotone, and rule-generated pitch contours. Figure 19 shows 
the contents of the tape. 
Various subjects said that although the sentences 
with rule-generated pitch contours did not sound as natural as the natural 
versions, they could clearly perceive the same distinctions in the rule 
versions +is were made in the natural versfons. Thus we established two 
criteria of evaluation: naturalneSs of intonation, and "intelligibility" 
of intonation, by which we mean a human listener can correctly perceive 
which reading of a multiple-reading sentence  as intended. 
A. Format of the Test 
In order to obtain a quantitative evaluation of the system, we 
devised the following four part test, which was presented to 17 subjects. 
The sentences in the test consisted of 35 versions made from a dozen sets 
of T2C analysis parameters by imposing various natural, manual, monotone, 
and rule-generated pitch contours on them. In the first part listeners 
were asked to rate readings of 34 sentences on a scale from 1 to 5, where 
I1 tr 
"1" meant the intonation sounded mechanical or monotone, and 5 meant 
the int~nation sounded natural. In the secorid part, listenets were 
presented with 24 sentence pairs and asked to indicate whether the first 
or second sentence s~unded more natural. 
The third and fourth parts of the test dealt with intelligibility 
of intonation. In both of theski parts, the subjects heard a sentence and 
indicated which of several pessible readings the intonation bas- intended 
to convey. 
The only difference between these last two parts was the 
method of designating the different readings. In the third part, rhe 
NATURAL vs . GENERATED INTONAT ION 
Natural Intonation Generated Intonation 
1'. John drove to the store. 2. John drove to the store, (monotone) 
3. John drove to the store. 
4. John drove to the store. 
5. John drove to the store. 
6. John or Mary come? 
7. Did John or Mary come? 
11. The boys who study get 
good grades. 
12. The boys who study get 
good grades, 
16. They are eating apples. 
17. They are eating apples. 
20. I have one. 
-T - 
21. - E have one. 
24. The cat that the dog chased 
got away. 
26. John buys rice? 
8. Did John or Mat'ry come? (monotone) 
9. Did John or Mary come? 
10. Did John or Mary come? 
13. The boys who study get good 
grades. (monotone) 
14. The boys who study get good 
grades. 
15. The boys who study get good 
grades. 
18. They are eating app-s. 
19. They are eating apglea. 
22. - 1 have one. 
- 
23. - I have one. 
25. The cat that the dog chased 
got away. 
27 . John buys rice? 
Figure 19. Contents of Preliminary Test Tape 
readings were designated by underlining and using a perfad or question 
mark at the end. In the fourth part, t;he readings were designated by an 
indication of a typical context for that reading. 
[Appendix D cwtains 
additional details of the test and the results). 
B. Test Results 
Table 1 gives the results of the first part, where sentences 
were rated on a scale from 1 (mechani~al) to 5 (natural). h'atural pitch 
contours received the highest score as expected, followed by manual contours 
based on the natural contour, rule-generated contaurs and monotone 
11 
contours" in that order. 
Table 1 COMPOSITE AVERAGE SCORES 
Natural Manual Rule-generated Mono tone 
4.14 3.76 3.61 1.24 
A paired t-test applied to the average scores for natural and 
rule contours for each listener showed a statistically significant overall 
preference for natural contours. 
In part 2, in a balanced subset of 42 paired comparisons where 
natural, manual and rule versions were paired in all possible ways, the 
natural contours received 87 votes, the manual ones received 76 and the 
rule contours received 41. Several subjects mentioned after the test 
thzt the natural, hand and rule versfons og the second sentence,  he he cat 
that the dog cnased got may") were indistinguishable ib naturalness of 
intonation. 
Usin& a non-parametrfc sign test technique, we postulated 
that if there were a significant preference for one pitch contour method 
over another, the listeners would be consistent in their choice, regardless 
of the order of presentation. Specifically, if four or fewer subjects 
out of 17 changed their minds, we can conclude a preference for a given 
pair and its reverse. 
Using this criterion, we found that for the first sentence, the 
natural version was significantly preferred but for the second sentence, 
there was no clear preference for the natural over the rule version. 
In parts 3 and 4, we tabted for "intelligibility" of intonation 
by presenting sentences and asking which of several possible readings 
was intended. We evaluated the results of this part by preparing con- 
fusion matrices. (Figure 20.) Each one deals with readiilgs of a single 
sentence, showing reading transmitted and pitch contour method (N=natural, 
R=rule) compared to reading received by the listeners. All readings are 
listed in Appendix D. 
A simple Chi-Square test shows that for a given row of one of 
these confusion matrices, 24 correct votes out of 33 or 34 are sufficient 
to show sigpificance at the .05 level. Results for part 4 were similar. 
C. Transmission Problems 
Some of the sentences were not well transmitted by the above 
definition. A consideration of these indicates the kinds of problems 
that arose. For example, since the first word of any normal declarative 
sentence receives some extra stress, tRe listeners had difficulty dis- 
tinguishing "John drove to the store" from "John drove to the store. 
r i 
Another problem sentence was "Did John or Mary come?" Although the two 
"JOHN DROVE TO THE STORE 
version sent 
I ZFAVE ONE 
version sent 
THE BOYS WHO STUDY GET GOOD GRADES 
version sent 
version recefved 
la lb lc Id 
version received 
5a 5b 
version received 
3a 3b 3c 
Figure 20. Confusion matrices for part 3 
rule version6 were clearly distinguishable (one with falling and one 
with riaing terminal intonation), the listeners made many incorrect 
choices. This may have bean due to either of the following two factors: 
(1) As with the other sentences, all the rule versions were based 
on a single set of analysis parameters, and duration was held constant. 
In this sentence, duration plays a greater role than in others, and this 
may have influenced judgment. 
(2) There may have been some c~nfusion about what the versions meant, 
and there may have been confusion with ie possible third reading in 
which "Johnt1 and "~ary" are stressed and yet the intonation is rising 
at the end. 
Dm Termination Problems 
Another problem mentioned by several subjects wzs that the 
intonation on some version8 (rule and hahd versions only) was natural up 
until the very end of the sentence. Re have determined that this is a 
problem in shaping+the contour from the last nuclear syllable to the 
final pitch of the sentence, assigning an appropriate fYna1 pitch, and 
determining the interaction between the pitch of the last nuclear syllable 
and the sentence final pitch. Further research-is needed in thls area. 
C. Discussion 
This paper is the report of an attempt to generateepitch 
contours in speech s3~1tbes3.s using Junction Grammar as a theoretical 
base. Since the various readings of each sentence were ma& by imposing 
different pitch contours on the same analysis parameters without changing 
durations, some versions were less than natural. However, this was to 
be expected and we feel that it was even desirable in that it pointed 
out some specific cases in which durationedjustments are necessary. 
The evaluation also pointed out the need for further research on the 
shaping of the contour from the last V3 to the efid of the sentence. 
We 
also realize the need to incorporate some refinements into the system 
in order to 
(1) make degrees of adjustment for fricatives and stops, 
(2) improve the naturalness of the contours between nuclear syllables, 
(3) make adjustments for the inherent pitch of $owel$ (Flanagan 
and Landgraf , 1968) . 
Based on the results of the evaluation test, we feel it is 
appropriate to continue use of the Junction Grammar framework and to 
attempt to develop a word concatenation version with duration, pause and 
intensit? calculations, to attempt better shaping of the contour after 
the last nudlear syllable, and to examine many more sentence types in 
order to further test the adequacy of this framework for dealing with 
the problem of generating prosodic control parameters in speech synthesis. 
ACKNOWLEDMENT S 
The author would like to express deep appreciation to co-author 
Eldon Lytle for many theoretical discussions, to W.J. Strong for sharing 
his acoustical expertise and LPC analysis-synthesis programs, and to 
Ronald Millet t for his excellent suggestions in our innumerable d'iscussions 
during this research and for doing the FORTRAN coding of the J-tree 
input, display mechanism, the conversion algorithms from J-tree to 
A-tree, and from A-tree to pitch contour. 
APPENDICES 
APPENDIX A 
BACKGROUND READING 
If the reader desires further background in acoustics speech 
processing and/or Junction Grammar, the following sources may be helpful. 
ACOUSTIC SPEECH PROCESSING: 
(1) The Speech Chain, P.B. Denes and E.N. Pinson, (Garden City, N.Y.: 
Doubleday Anchor Books, 1973) 
(an excellent non-technical ovwview) 
(2) Speech Analysis Synthesis and Perception, J.L. Flanagan, (New 
York: Springer-~erlag,'1972) (a thorough technical presentation) 
(3) Speech Synthesis, edited by J. Flanagan and Lo Rabiner, 
(Stroudshllrg, Penn.: Dowden, Hutrhinson and Ross, 1973) 
(a cdllection of key historical and current professional 
articles) 
JUNCTION GRAMMAR: 
(1) A Grammar of Subordinate Structures in English, (Lytle, 1974) 
(A Description of Junction Grammar. The concepts discussed are 
still valid in Junction Grammar theory but the notation has 
changed significantly) 
(2) AJCL microf icke ip26 
"JG as a Base for Natural Language Processing. 
(The first chaptq is a good introduction to JG bht does not 
go into much detail) 
(3) BYU Linguktics class textbooks. There are several Linguistics 
classes at BYU in Junction Grammar. Ling 426 is an introductory 
course and Ling2501 is an intermediate class. The textbooks are 
still in development and have not yet been published but if 
the 
reader would like more detail than is available in the first 
two sources, he can write the BYU Linguistics department for 
copies of class handouts for Ling 426 and Ling 501. The 501 text- 
buok is the only available source on specialized subjunction. 
(4) "Junction Theory as a Base for Dynamic Phonological Representation. 
BYU Linguistfcs Symposium, March 1976. (This is the only 
available document on the A-tree extension of JGo It is reprinted 
at the end of this microfiche, for the convenience of the reader.) 
GLOS S ARY 
A/D 
Analog to digftal 
A-tree 
Articulation tree 
D /A 
Digital to analog 
ENCLITIC 
A ward which generally combines with the following word into a 
single V3, e.g. "the, " "what, 
11 TI 
or. 
#I 
FO 
Fundamental Frequency 
HERTZ (HZ) 
1 Hz = 1 cycle/second 
JG 
Junction Grammar 
J-tree 
Junction tree (contains semanti-co-syntactic infomation) 
LPC 
Linear predictor coefficient 
NUCLEAR SYLLABLE 
The ranking syllable of a V3, in Isacenko (1970) it is called 
the ictus. 
PITCH 
In this paper pitch contour is used to mean fundamental frequency 
contour 
PROSODICS 
There are w6xd-boundary effect, phrase-level stress contours, and 
clause-level phenomena which affect the waveform. These factors are 
referred to as the suprasegmental or prosodic features of speech. 
SUPRAS EGMENTAL FEATWS 
See prosodics, 
TEXT SYNTHESIS 
Typed-sentence to code to speech-waveform. 
v3 
A syllable. See Lytle (1976) for a more precise definition. 
APPENQIX C 
COMPUTER IMPLEMl3NTATION 
The pitch contoux generation system described in this paper has 
been implemented on a PDP-15 computer, equipped with s variety of 
peripheral devices configuted as shown-in Figure 21. The VT-15 allow6 
the user to call a package of subroutines from FORT& to plot points or 
draw lines or characters. The system uges the DEC supplied DOS-15 
operating system. 
The PDP-15 is equipped with 32 K 18-bit words. This is not 
enough memory for our mqin pitch contour generation program so we use 
the DOS-15 CHAIN AND EXECUTE facility to overlay programs that need not 
be core resiaent shultaneously. 
As indicated in Figure 21, there are two disk drives on the 
system. One is a standard DOS-15 system pack for system programs and 
user ftles. The other drive is mainly for speech data. Data on packs 
nounted bn this drive is accessed through special assembler subroutines 
that are not part of the DOS-15 operating system, This allows the user 
to store data contiguously at a higher transfer rate than possible 
using standard DOS-15 files. This is especialA7 important in transferring 
large amounts of data from the A/D to disk or from the disk to the D/A 
in real time. Thus the system can deal with longer segments of speech 
than can be stored in in-core buffers at ohe time. 
In order to describe the pitch contour system, we will describe 
the major data files and off-line support programs the system requires, 
For each sentence to be processed, the system needs (1) an entry in a 
speech d9rectory file (SPCDTR) which indicates the address ac~d length on 
the speech data disk of the LPC analysis parameters. (2) An identification 
DOS-15 
@ System 
'Disk 
Drive 
Ff gure 21, Hardware Conf f guratf on 
D/A 
- 
f 
- 
PDP- 15 
Speech 
CPU 
Data 
Disk Drive 
and I/O 
\ ,. 
3 
DEC 
tape 
drives 
Decwriter I1 
Rardcopy 
Terminal 
i 
. 
Paper tape 
reader ahd 
punch 
i 
1 
VT-15 graphics display 
unit witR light pen 
< 
L 
(ID) file which specifies the word boundaries, etc. and the file names 
of the J-tree files for the various readings of the sentence. The 
J-tree contains kejls to obtain lexical information about each word from 
a master lexicon file. (3) A J-tree file for each reading. 
In order to prepare a sentence for processing, it is tape 
recorded, then digitized at a lOKHZ sampling rate using a program called 
DIGTXZ. Then it is LPC analyzed and optionally examined Qn the graphics 
display, using a program called ANAPLT. The "PLT" at the end of the name 
refers to the fact that this program will also produce a hard copy plot 
of the pitch contour if desired. 
The pitch contaur generation program is called JTSPCH ("J-Tree 
to speech"). When this program is executed, it presents a list of 
available sentences and asks the user to indicate which read5ng to use 
in this case. Then the program reads the J-tree file and creates a 
J-tree in postfur notation. The program then optionally displays the 
J-tree on the graphics unit, depending on the status of the console sense 
switches. Then the J-tree is converted to an A-tree, which again is 
optionally displayed. Then a pftch contour is generated from the A-tree 
and displayed. Finally, the pitch contour is combined with the LPC 
analysis parameters retrieved from disk (gain factor, voiced/uniroiced 
decision and 12 linear predictor coefficients per 10 msec of speech 
waveform) and the contained parameters are used to synthesize a speech 
wayeform which is stored on a temporary disk area and repeatedly played 
through the D/A converter to a loudspeaker or headphones for evaluation. 
If desired, the user can then save it permanently on disk. Another 
processinq option is to create a manual pitch contour instead of gene 
rating it from an A-tkee. The manual contour can be catered either by 
drawing it on the graphics unit with the Mght pen or by entering a 
list of time and pitch coordinates on thsr teletype to a subroutine that 
intetpolatea linearly between them. Of course, the sentence can also 
be synthesized using the natural pitch contour retrieved from the 
original analysis data. 
After sav2ng several syntehsized sentences, one can listen to 
a list of sentences with any dr sired pause between them using a multiple 
146tening prbgrh called MULTIL. MULTIL can receivv its control input 
from either the teletype or from a data file. This option allowed us to 
create a control file with the regular editing facilities of the 
operating system and then inytruct MULTIL to read it, creating the 
evaluation test tape in one continuous recording session without any 
t%pe splicing. 
APPENDIX D 
MORT3 DETAILS ON THE EXALUATION 
This appendix contains the following information: 
An edited version of the evaluatisn response form given to the 
subjects and thenfour tables showing all responses. Note that 
the parts of the response form are numbered IA, IB, IIA and 
IIB. This edited response form shows which versions were used 
throughout the test but does not contain certain unnecessary 
details present in the actual response form used. Each version 
is i'dentified by a code consisting of a nwber (1-8), a letter 
(a-e) , a letter (N, R, M or H) and possibly another number (1-4). 
The first two characters identify the sentence and reading; 
as follows: 
(1) a. John drove to the store. 
b. John drove to the store. 
c. John drove to the store. 
d. John drove to the store. 
e. John drove to the store? 
(2) a. Did John or Mary come? (fairing at end) 
b. LlLd John or Mary come? (rising at end). 
(3) a. The boys who study get good grade&. 
b. The boys who study get good grades. 
c. The boys who study get good grades. 
(4) a: They are eating apples. 
b. They are eating apples. 
a I have one. 
- 
b., 1- have one. 
.t- 
(4) a. John, 30e ahd Fred bliy riae. 
(7) a- The cat that the dog chased got away. 
(8) a. Jdhn buys rice. 
b. John buys rice. 
c. John buys rice. 
d. John buys rite. 
- 
e. Joha buys rib? 
The neqt character ideneif ies the naturc! of the pitch contour as follows: 
N = Natural 
R - Rule (ganerated by rule). 
M = Yonotone Cconstant fundamental frequency) 
H = Hapd (manually specpfied) 
If a number follows the R Lt %ad tcates which hand made contour was used. 
RESPONSE FORM 
Date - 
Name Age+ Sex 
Occupation 
I. NATURALNESS OF INTONATION 
A. Below are two lists of the same 34 sentences. You will 
hear the first list with a $ second pause after each sentence. Just 
listen and don't write anything. Then 10 secqnds later, you will hear 
the second list with a 3 second pause after each sentence. This time, 
during the pauses, rate each sentence by writing down a number after, 
it, The rating scale is 1 to 5. Remember that the evaluation criterion 
is intonation only. 
So please do no9t let your judgements be in£ qugqced by crackles or pops 
- 
or hisses. 
A rating a£ 1 means the intonation sounded mechanical or unnatural, for 
example, monotone or the way computers talk in cartoons. A rating of 
5 means the intonazion sounded natural, that is, you can imagine the 
sentence was produced by a human speaker speaking carefully. Please 
try to dis'tribute your scores over the entire range from 1 to 5. 
Before you begin, please read over the entire test to become 
familiar with it, because you will have only a few seconds to respond 
to each question. 
The test will last 17 minutes. 
(The following fouf pages are an edited, abbrevdated form of the rest 
of the response sheets. The codes in parentheses were not on the 
actual response sheets. 
By 
consulting the key on the previous pages of 
this appendix, the reader can determdne from the codes which version 
was used for each question.) 
I A. 
1. I haveone. 
2, The cat that the dog chased got away* 
3. Did John or Mary come? 
etc. 
33. The cat that the dog chased got away. 
34. John drove to the store. 
SECOND TIME THROUGH: Rate each sentence (1)Mechanical to 
(5)Natural. 
1, I have one. .............. f5b~) 
2, The cat that the dog chased gut away. (7aR) 
3. Did John or Mary come?. ....... ( 2 bR) 
The rest of part IA will be shown in 
abbreviated form. 
34. John drove to the store ....... (law 
Pair Number 
1st sounded 2nd more 
more natural natural 
J J 
........ 
1. Did John or Mary come?. (2aN) (2a~) 
2. uid JoBn or Mary come?. ...... (2aH1) (2aH2) 
3. Did John or Mary come?. . . . . . . (2aR) (2aN) 
Questions 4-12 deal with 
sentence 2a using various 
pitch contours. 
Questions 13-24 deal with 
sentence 7a using various 
pitch' contours. 
13(R,N) I4 (H1 ,N) 15 (H1,R) 16 (R,H4) 17 (N, H4) 
18 (R, HI) 19(H4 ,R) 20(H4,WI) Zl(H4,N) 22 (R,N) 
23(Hl,H4) 24(N,R) 
1. John buys rice (8dR) 
a. John buys rice. 
b, John buys rite. 
c. John buys rice. 
d. John buys rice. 
e. John buys rice? 
2. Did John or Mary come (2aN) 
a. Did John or Mary come? 
b. Did John ox Mary come? 
The rest of part IIA will be shown 
in abbreviated form. 
I1 B. 
1. They are eating apples (4a~)I 
a. They are in the process of eating apples. 
b. These apples are a Variegy good for eating as 
opposed to baking. 
2. They boys who study get good grades (36~) 
a. Neutral 
b. But the boys who play around get bad grades. 
c. But the girls who study don' t get good grades. 
3. Did John or Mary come (2aR) 
a. S-omebody came. Was it John or was it Mary? 
b. Several people came. Did the group include John 
or 'Mary? 
4. John drove to the store (IbR) 
a. In response to: "What h~ppened?" 
b. In resp~nse to: "Who drove to the store?" 
c. In response to : "How did John get to the store?" 
d. In response to: "Where did John drive?" 
e. To ask for verification of what was said. 
X have one (5bN) 
a. But YOU have t-hree. 
b. But you don't. 
John drove to the store (IcR) 
a. In response to: "What happened?" 
b. In response tqA: "Who drove to the store?" 
c. In response to: "How did John get to the store?" 
d, In response to: "Where did John drive?" 
e. To ask for verificatioh of what was said. 
Did John or Mary come (2bN) 
a. Somebody came. Was it John or was it Mary? 
b. Several people came. Did the group irlclude John 
or Mary? 
They are eating apples (4bN) 
a. They are in the process of eating apples. 
b. These apples are a variety good for esting as 
opposed to baking. 
The boys who study get good grades. (3cR) 
a. Neutral 
b. But the boys who play around get bad grades. 
c. But the girls who study don't get good grades. 
I have one (5aR) 
a. But you have three. 
b. But you don't. 
Table D-1 
The responses for part IA. Each row gives the response 
of- subject 1 through 17 to a particular question. A zero 
response means the subject left that question blank. 
Table D-2 
Responses for part IB. "I" means the subject chose the 
first element of a pair; "2" means the second element. 
Responses for part IIA. (See Table D-3 on next page). 
11 tI 
1 means the subject chose version a . 
11 2 lt 
means "b" 
''3" means 
It I1 
C , 
lr 4" means "d" . 
t1 It "5" 
means e . 
It0" 
means no response, 
Table D-3 
Table D-4 
Oil 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 
Responses for part IIB, 
(Same format as Table D-3.) 
JUNCTION THEORY AS A BASE 
FOR 
DYNAMIC PHONOLOGICAL REPRESENTATION 
BY 
Eldon G. Lytle 
BYU Linguistics Symposium 1976 
JUNCTION THEORY AS A BASE 
FOR 
DYNAMIC PHONOLOGICAL REPRESENTATION 
Orientation 
MacNeflage has pointed up the difficulty of mediating between abstract 
unitary phonological representations and the continuous nature of 
the 
dynamic speech chain, suggesting that unitary phonological represerrtations 
are analogous to a sequence of eggs conveyed to the wringer of a washing 
machine, while the scrambled mess that emerges fro9 the wringer is what 
must actually be dealt with by those engaged in computer analpsis and 
1 
synthesis of voice. The quqstion, as he states it, is: 
Given that there is a discrete linguistic input to the 
mechanism of speech production at some state, and given 
that the mechanism that transmits this input is incapable 
of discrete units of output, what is the nature of the 
transforma ion, at the peripheral staget, of one form to 
the other. 
5 
Lieberman likewise notes a relative neglect of the phonetic level of 
speech, concldding that a quantitative and expl$c$t phonetic theory has 
yet to be developed, and suggesting that a successful attempt ta 'construct 
such a theory should be structured in terms of the aaatomfz, physiologic, 
and neural mechanisms of speech producrion and perception. 
Onn, similarly motivated by the notion that speech .ought to be 
described in the context of the organic mechanisms responsible for it, 
supgests, that: 
It may, be argued that an abstract representation may be 
regarded as instructions for particular types of behavior 
of the kpeech-generating mechanism. When these instructions 
are carried out, the various reactions occurring between 
afferent physiological structures will yield 4 quasi- 
continuous gesture in whieh the discrete lnstructions initiating 
the gesture are no longer always observable as distinct 
comporlents. Finally, the exe ution of these instructions 
produces the acoustic signal. 
E 
The p~irpose of the present paper is to outline briefly a new system 
of phonolo~cal description cumently being used as a basis for voice 
synthesis at BYU which attempts to satisfy the criteria suggested by 
ITacNeilage, Lieberman, and Onn ref eremed above. The descriptive system 
in question is based on the Junction Gramar Model of language developed by 
myself and my colleagues over the past eight years.5 It is a model 
specifically structured in terms of speech-related organs, either as they 
are known oi hypothesized, 
An Overview of the Junction Grammar Model 
A fundamental tenet of junction theory is that linguistic description 
must involve not shply multiple stages of derivation, 
but multiple types 
of data and data processing required to simulate the functions of different 
body organs. (See Figure 1.) Thus, the semantic components of the grammar 
are designed to gsocess data structured for specific semantic tracts, as it 
were; the articulatory component is designed to process data structured for 
the vocal tract, the audio component is designed to process data stfuctured 
for the auditory tract, and so on. 
Of course, such a model requires distinct 
rule systems and procedures to operate on thedifferent data types in the 
various tracts. 
Figure 3.. 
A further tenet of junction theory is that data types may not be 
intermingled. 
To dq so would, f ot example, be tantamount to feeding 
instructions for both the heart 4nd diaphragm to the diaphragm. 
Of 
course, semantic instructions could not be executed by a vocal tract, 
nor could articulatory instructions be executed by a semantic tract. 
This 
means, in eff eot, that a "deep st~ucture" is not transfdrmed (in the usual 
sense of the word) into a surface qtructure, but rather that semantic data 
must be used to stimulate articulatory instructions, orthogrziphic instructions, 
motor instructions required to produce gestures, to make one blush, etc. 
Thus, in JG semantic representations there are no lexical items, since 
these are considered to be arqiculatory inS$xuctions. Similarly, there 
is no semantic inf ormdtion in phonological repyesentations, since these are 
a different data type. The various data types are considered to be symbol- 
izations of each other, not transTdm or derivations of each other. Data 
stimulation between the various tracts or components of the system is 
accomplished by context sensitive coding/decoding procedures, which are 
intended to simulate the neural interfaces which coordinate the function 
bf body organs involved in speech production. 
Jupction Grammar takes its name from Junction Rules (J-rules), (See 
Figure 2.) J-rules structure data to be processed by the various components 
of the grammar. The essential ingredients of ev2ry.J-rule are two or more 
operands, an operation specifying hdw the operands are tu be joined, and 
a labelling operation which assigns a category to the operands taken as 
a unit. Thus, in junction grammar not only do rules for con-Junction require 
an operation symbol (visa the phrase structure rule S+S & S).but all Jlules, 
regardless of their specialization, 
junction operation 
operand 
;f 
secondary 
operand 
labelling operation 
JUNCTION FORMULA WITH LABELLED PARTS 
\ categoqy of the 
resultant 
constituent 
Figure 2. 
A schematic of the model in its present form is given in Figure 3. 
Basic semantic data is presumed to reside in the form of an information net. 
Drawing upon information in the net, J-rules or gad ze and s trueture inf ormatlon 
pragmatically, i.e. for use in specific utterances in specific discourse 
environments. Fillmore's arguments for semantic case relate specifically 
to the need to distinguish between basic semantic relations and pragmatically 
motivated grammatical relations. The semantic junction trees (J-trees) 
generated by J-rules then serve a? the basis for coding up articulatory 
instructions, instructions to the arm and hand for writing, or motor 
instructions of pmdry types necessary to produce body language. 
Incoming information, on the other hand, is decoded to obtain the 
pragmatic J-tree which stimulated it, and then each junction in the tree 
is executed by a semantic processor, resulting in additions to or changes 
in the information net. 
Junction trees occur in both semantic and articulatory data. However, 
the qpexands and operations are of a totally different nature from type to 
type, since in the semantic component they constitute complexes of instructions 
to be executed by the semantic processor, while in the articulatory component 
they constitute complexes of instructions to be execueed by the vocal tract. 
The operands of semantic trees are sememes, i.e. units which define locations 
and states in the information net; tEe operands of articulation trees are 
articulemes, i.e. units which relate to locations and states of the vocal tract. 
Figures 4 and 5 are the semantic and articulation trees, respectively, for the 
utteragce [~aysa iyt]. Notice, specifically, that while Why did you are not 
immediate semantic constituents, they are immediate Etrticulatory constituents; 
The point again, of" course, is that while articulatofsy structure and semantic 
structure are symbalically related, they axe not the same and should not be 
confused or intermingled. 
Discourse 
Monitor 
J-rule 
Coding 
Basic 
Semantic 
J-tree 
Compilation 
-d(~ragmatic Data) 
flh 
Gestured Graphic Audio 
Inpbt Input Input 
< 
L 4 
Semantic tree for Why did you eat? 
8 why? 
* v 
eat 
Words represent sememas. There is no lexical data in 
semantic trees. 
Figure 4. 
Articulation tree for 
[ Hway73 iyt ] 
H * 
Bu 
ressed) 
A+ +VlG -+ *C*C 
x 
A 
v1 + C 
d d $ 
Segmentah and suprasegmentals represent 
articulatory units. There is no semantic 
data in A-trees. 
Figure 5. 
Bask Junction Types 
Junction theory posits three basic junction operations and numeroud 
subtypes depending upon the data tvpe beinn described. 
(11 Adjuqcfion results in the f~rmation of certain nuclear units 
which serve as a skeleton to whicL other elements may attach. In semantic 
trees, predicates and predications are formed via adjunction. 
In articu- 
lation trees, semi-syllables and syllables are formed via ad junction. 
(2) Subjunction results in overlapping constituents of contrasting 
rank, i.e. where one is in some sense subordinate to the other. 
In semantic 
trees, modifiers in all their variety are subjoiried. 
In articulation trees, 
clustered consQnants ar,e subjoined, as well as adjacent syllables having 
different degrees of gtress. Segmental structures are also subjoLned to 
prosodic consti-tuent~ to account for the supra-segmental aspects of 
articulation. 
(3' Conjunctipn results in the format ion of compounds consisting 
of units of the same category and rank. In semantic trees, compounds 
based.on - and, -' or and - but are formed via conjunction. In A-trees, con- 
junctjon yields evenly spaced non-overlapping units having the same degree 
of stress. 
Now, in the context of this rather general introduction to the subject, 
let us consider dynamic phonological representations corresponding to the 
artfeulatory structure of syllables, words, and phrases. 
The Syllablk 
The iptaitive articttllatory unit of which words 
consist is the syllable, 
which is in turn com;posed of phonemes. Generally speaking, syllables have 
as their nuclear component a coatinuous phoneme wlth vocalic properties. 
This nuclear phoneme may be delimited both initially and finally by a 
phoneme having consonantal properties. Eence, we observe syllables of the 
followtng string types: 
D = delimiter; W =,rhucleus; 0 is null 
DWD 
~w8 
flwD 
8w8 
If, however, we invoke the concept of a null delimiter $, then these four 
syllable patterns can be reduced to a single type, DWD, where D may be 
either null or non-null. The use of the null delimiter $ is actually more 
than a simplifying assumption, since in many cases non-null segmentals 
replace $ in the articulation stream either as full geminates or partials 
of neighboring delimiters. 
Articulatory Adiunction 
As noted above, junction theory attributes to adjunction those kernel 
configurations 
upon which all else is built up. Since syllables are the 
intuitive units from Qhich words and phrases are formed, we attribute them 
to adjunction. 
There are two basic syllable types, corresponding to whether the 
sy1labi.c nucleus is joined to the initial or ffnal delimiter, The two 
cases are illustrated in Figure 6. 
NUCLEAR-INITIAL SYLLABLE MUCLEAR-FINAL SYLLABLE 
Figure 6. Two basic syllable types. 
Recent research provides useful criteria for deciding when to ~'ee each 
type. 
Bell-Berti and Harris report that: 
The effects of the terminal consonant on the midpoint of the 
stressed vowel are not as large as 
those of the initial con- 
sonant. In other wordb, the carryover effect of the first 
consonant on the stre~sed vowel is larger than the anticipatory 
effect on the second. 
For the purposes of this discussion, let us assume that stressed 
syllables and syllables with strong vowels are nuclear-iniqial and that 
other syllables are nuclear-final. It is possible, of course, to formulate 
junction rules which are not binary, so that a third syllable type whose 
nucleus was equally joined to both initial ahd final delimiters could be 
used, We avoid this foumal complication, however, until forced to intro- 
duce it by empirical considerations. 
Notice that the use of structure to represent syllables makes it 
unnecessary to use a feature such as [+syllabic]. In comparing the use of 
this feature to that of the structural notation proposed, we note that each 
appears to make distinct claims about the notion syllable. 
Specifically, the 
feature asserts that a vowel is syllabic, - whereas the tree claims that 
spkcific sequences of segmentals constitute syllables whose nuclear element 
is a particular segment. 
Node Labels 
Turning now to the matter of node labels, we observe that ih practire 
it is desirable to further subcategorize D and W in terms of more specific 
articulation classes. 
We therefore define D to include obstruent consonants 
(C) , liquids (L) , glides (G) , and null ) 
For W, vowels (V) and liquids (L) 
are indicated, and perhape in some cases even continuant obstruents, assuming 
that expressions such as vocative "pssst" are to be analyzed as syllables 
also. We note parenthetically that glides (G) are suspect, since they appear 
to be functional variants of vowels, i,e, vowels functioning delimitively. 
This, however, is not a problem, since the use of J-rules to represent 
articulatory structures makes it just as feasible to consonantalize a vowel 
by rule as it is $n the semantic component to nominalize a verb by qule. 
In short, the ae of junction trees to represent articulatory structure 
brings a great deal of descriptive power to bear, should we need it. 
Thus we supplant D and W with more descriptively spedfic node labels 
and (append to them some element of their respective vocabularies as terminal 
units, as illustrated by Figure 7. 
Figure 7. 
The significance of V2 and V3 as non-terminal labels is 
that of 
semi-syllable and syllable, respectively. Bear in mind that the operation 
symbols appearing between operands are representative of the artcculatorv 
junctions (transitions) between them. Hence non-terminal nodes symbolize 
articulatory sequences consisting of the phonemes they dominate plus the 
transitions necessary to account for continuous movement from one distinctive 
vocal tract state to the next. This signifies, in effect, that glven a 
junction instruction of the form X O Y = 2, there exists a transition 
T = O(X,~), such that XW is a continuous articulatory sequence Z con- 
sisting of the distinctive units X and Y mediated by transitional T. 
This 
aspect of the fornulation is advanced as an attem~ t to satisfy the need for 
phonological notation potentially capable of explicating both the discrete 
segmental elements of which the speech chain is composed, and the co- 
articulatory transitions which connect them in live speech. 
The practical 
effect of the foxmulation is that one's attention is drawn not to a yelatively 
limited set of radical phonological changes, but to the co-articulatory 
effect of every junction on its operands, regardless of its subtlety. 
This 
is important :f high quality synthetic speech is to be achievkd. 
Delimiting Clusters 
I 
- 
Both initial and final syllable delimiters frequently consist of 
clusters of segments rather than discrete segments. 
An analysis of such 
clusters shows that notable assimilative forces are involved. 
We view 
this as a form of articulatory subordination, and, consequently, use 
subjunction as the basic junction type for treating such clusters. 
The 
fact that articula~ion trees are capable of showing a variety of compositional 
arrangements makes it possible to give whatever internal structure for 
such clusters as seems to be operative. 
Thus for strand, where tr seem to 
.- 
be more closely associated than st 
this can be explicitly represented. 
-9 
(See Figure 8. ) 
Articulation tree for strand 
Figure 8. 
Multi-syllable Words 
Let us now consider how multi-syllable words may be given in the form 
of articulation trees. The procedure, briefly, is as follows, using 
Bambi and Donna as the words to be diagrammed: 
(1) The syllables are identified. MI-BI [baem - bi] 
DON-NA [da . - na] 
(2) The syllables are diagrammed using the appropriate adjunction type. 
(3) 
An interjunction is constructed using syllable-final and syllable- 
initial constituents, (The label node is given as C since b - seems 
to exert assimilative force over m.) - 
(4) 
The label node of the sub juxlction attaches to the more heavily- 
stressed syllable. 
(5) The in*tial delimiter of the more weakly-stressed syllable becomes the 
intersect node. 
Bambi Donna 
Subordinate 
Main Sy llabl. 
Syllable V3 V3 +- Main Syllable 
fi 
-,* 
Subordinate 
Syllable 
C 
b ae Te b i d a n 
An interesting result of the not ,tion is that stress is no longer 
a property of vowels, but of entire syllables, i.e. the delimiters and the 
vnwe1. Further, stress reflects a relation between constituents, so that 
no features expressing stress values are necessary. 
Phraa es - 
Phrases are diagrammed by introducing prosodic constituents (B) to 
which word-trees are subo~dinated. 
(Refer to Figure 5.) The ranking syllable, 
1.e. the pne receiving primary stress, joins to the prosodic constituent. 
The notation is intended to reflect the simultaneous execution of segrneni-a1 
and supra-segmental units during the articulatory process, in a way com- 
parable to the multitudinous internal manipulations of an engine as one 
turns a crank. The crank of tbe articulatory apparatus is the diaphragm 
and other musculature which provide energy and assume other symboI cally 
significant states at certain intervals during the executioh of the seementals. 
Prosodic constituents result in the specific intonational contours we hear 
superimposed over syllables, words, and phrases. 
While both segmental and suprasegmental constituents are coded in 
the context of senantic data, we emphasize again that A-trees contain only 
articulatory data. Thds, if A-trees are compared to the customary 
representations of generative phonology, as typified by those given by 
9 
Chomsky and Halle (cohpare Figures 5 and 9), it win be noted that! the 
syntacto-semantic superstructure of the regular trees are replaced by an 
artfculatorp ssperstructure fn the A-trees, The rationale for this 
departure from standard practice is not only motivated by the requirement 
impased by the theory (that data types not be intermingled), but also by 
the observation that the regular trees tend to neglect prosodic articulatory 
phenomena. When in£ ormation .relating to these phenbmena is incorporated into 
articulation trees, it replaces the usual superstructure of S's, NP's, 
and other similar lables in a natural way. The prosodic constituents thus 
introduced are comparable in their function to the intonation contours 
associated by rule with segmental sequences in the systbm proposed by Leben. 
10 
1 # # #tele+graph#ic # # ~lcomunicatei!! ion 8 # B # 
unctional Versus CategorLal Informatioh - 
Tha proposed system of phonological description makes possible m 
Lntexesting hypothesis regarding many of the features used in current 
iescriptions. Specifically, if A-trees are in some senge a reflection of 
actual articulatory processes, then phonological representations whfch do 
not use trees wili consist of an intermixture of functional and categorial 
lables (features). For exaxriple, ff trees are used to represent the relations 
bktween subject, verb, and object, it is not necessary to label the subject 
as such or the object as such, since structural relations make these notions 
explfcit. If trees were not used to represent sentence structure, however, 
functional labels would have cp be used. 
Similarly, it follows that if trees are an appropriate medium for 
phonological description, but have not been used, then functional and 
categorial information are intermingled h current descriptions. If this is 
true, then it should be possible to abstract functional information away 
(and consequently not write it in feature form) by elaborating A-tree 
notation. 
While the proposed system is still in its infancy, so to speak, some 
interesting initial observations in this regard can be made at this time. 
First, major category features become node labels in a natural way, thus 
suggesting why the formal illusion exists that a change, for example, of 
[+cons] + [-cons] is equal in magnitude to a change of i-hroice] - [-voice] 
Second, [tsyllabic] ([?consonantal] and [&vocalic] are also used in some 
systems) are functional labels and need not be wr'itten if syllables are 
given as tree structures. Third, stress at the segmental level and un- 
marked pitch at the prosodic level become implicit in structure in terms 
of the rank of operand's in articulakory subjunction and need not be 
specified by feature. 
While it is beyond the ecop8 of this paper to 
elaborate this point further, it is without doubt the most interesting 
and provocative consequence of the research to date. 

REFERENCES 
Allen, J. (1976) "Synthesis of Speech from Unrestricted ~ext," 
Proc. IEEE, Vol. 64, No.4, pp. 433-442, April 1976 

Atal, B.S. and S.L. Hanauer (1971) "Speech analysis and synthesis by 
linear prediction of the speech wave," J. Acoust. Soc. Amer., 
Vol. 50, N0.2, pp. 637-655 

Flanagan, J.L. et al. (1968) "Self-Oscillating Source for Vocal-Tract 
Synthesizers, I' IEEE Transactions on Audio and El ectroacous tics, - 
Vol. AU-16, No.$, llarch 1968 '(see esp. p.60) 

Flanagan, J.L. et al. (1975) "Synthesis of Speech From a Dynamic Model 
of the Vocal cords and Vocal Tract,"'ke Bell Systhm Technical 
Journal, Vol. 54, No.3, pp. 485-505, March 1975 

Haavel, R. (1976) "~emporal characteristics of the Pitch Contour, 
Acustica, Vol. 34, pp. 14~3-157 

Halliday, M.A.K (1970) "Functional diversity in language as seen from a 
consideratPon of modality and mood in English," Foundations 
of Language, Vol. 6, pp. 322-361 

Isacenko, A. and H. Schadlich (1970) A Model of Standard German Intonation 
(The Hague: Mouton) 

Leben, W. (1976) Manuscript ~f an article to appear in Li,nguistic Analysis, 
N0.2, 1976 

Levine, A. (1976) Report on Prosodic Research at IMSSS, Stanford 
University, a preliminary draft of a forthcoming technical 
report, received March 1976 

Lytle, E.G. (1974) A Grammar of Subordinate Structures in English, 
(The Hague: Mouton) 

Lytle, E.G. et a1 (1975) "Junction Grahar as a base for natural language 
processing," -herican Journal of Computational Linguistics, 
microfiche No.26 

Lytle. E.G. (1976) "Junction Theory as a base for dynamic phonological 
representation," Brigham Young University Linguistics 
Symposium, March 1976 

Melby , A. et al. (1975) "Modifying Fundamental Frequency Contours ," a 
paper presented at the 90th Meeting of the.Acoustica1 Society 
of Ameraea, November 1975 

Melby, A. et al. (1976) "Generating Pftch Contours from Syiltacto-Semantic 
Representations," Brigham Young University LinguisticS 
Symposium, March 1976 

Olive, J.P. (1975) "Fbndamental frequency rules for the synthesis of 
simple declarative English sentences," J. Acouat. Soc. Amer., 
Vol. 57, No. 2, February 1975 

Umeda, N. et al. (1975) "The Parsing Program for Automatic Text-to- 
Speech Synthesis Developedpt the Electrotechnical Lqboratory 
in 1968," IEEE Transactions on acoustics, Speech and Signal 
Processing. Vol. ASSP-23. No. 2, April 1975 

Umeda, . (1976) "Linguistic Rules for Text-to-Speech Synthesis," 
Proc. IEEE, Vol. 64, No. 4, April 1976 

1. Peter Fa MaoNdilage, "Linguistic Units and Speech Production, 
an invited paper presented at the 85th meeting of the Acoustical Society 
of America, Boston, Massachusetts, Apyil 13, 1973, 

3 Philip Lieberman, "Towards a UnifiedqPhonet!c ~heory," Linguistic 
Inquiry, Vol. I, No. '3 (July, 1970), 307-322. 

4 Farid M. Onn, ''~~eech Chain as an Analysis-By-Synthesis Model; 
A ~eview," Studies ,Qs Linguistic Sciences, Vol, IV, No. 2 (Fall, 1974), 
168. 

5 Eldon G. 
Lytle, A Grammar of Subordinate Structures in English, The Hague: Mouton 
and Co., 1974, 

5b Eldon G. Lytley "~tructural Derivation in 
Russian," unpublished Pb .D. dissertation, University of Tllinois 
(Champaign-Urbana), 1973. 

6 Charles J. Fillmore, "The Case for Case, " Universals. in Linguistic 
Theory, ed. Emrnon Bach and Robert T. Harms (Holt Rinehart, 1968), pp. 1-89. 

8 Fredericka Bell-Berti and Katherine S. Harris, "Some Acoustic 
Measptes of Anticipatory and Carryover Coar ticulation." 

9 Noam Chomsky and Morris Halle, The Sound Pattern of English, New 
York: Harper and Row, 1968. 

10 Villiam Re Laen, ''The Tones of English Intonation, to appear in 
Linguistic Analysis, 2, 1976. 
