Integrating Linguistic and Performance-Based Constraints for Assigning
Phrase Breaks
Michaela Atterer
Institute of Natural Language Processing
University of Stuttgart, Germany
atterer@ims.uni-stuttgart.de
Ewan Klein
Division of Informatics
University of Edinburgh, UK, and
Edify Internat. Development Centre, UK
ewan@cogsci.ed.ac.uk
Abstract
The mapping between syntactic structure and
prosodic structure is a widely discussed topic in
linguistics. In this work we use insights gained
from research on syntax-to-prosody mapping in or-
der to develop a computational model which assigns
prosodic structure to unrestricted text. The result-
ing structure is intended to help a text-to-speech
(TTS) system to predict phrase breaks. In addition
to linguistic constraints, the model also incorporates
a performance-oriented parameter which approxi-
mates the effect of speaking rate. The model is rule-
based rather than probabilistic, and does not require
training. We present the model and implementa-
tions for both English and German, and give eval-
uation results for both implementations. We then
examine how far the approach can account for the
different break patterns which are associated with
slow, normal and fast speech rates.
1 Introduction
Normal spoken language is not delivered in an un-
interrupted monotone; prosodic cues such as pauses
or boundary tones greatly help the listener to un-
derstand an utterance. Most text-to-speech systems
use statistical models to find the appropriate loca-
tions for prosodic phrase breaks. In this work we
use insights gained from the linguistics literature
to develop a computational model which assigns
prosodic structure to unrestricted text.
We start by briefly reviewing the relationship be-
tween syntactic and prosodic structure. Figure 1
shows an example of the right-branching syntactic
structure that is standardly assigned to English sen-
tences. Figure 2 shows a much flatter tree which
corresponds to widely accepted views of the same
sentence’s prosodic structure. According to the lat-
ter, the Utterance level is partitioned into intona-
a little girl who didn’t like big dogsHe would tease
NAPVNPNAPV DetINP
IPN’
N’
I’
I
VP NP
NPVP
I’IP
Figure 1: Syntactic structure of a sentence.
a little girl who didn’t like big dogsHe would tease
φ φ φ φ
I−phrase I−phrase
Utterance
Figure 2: Prosodic structure of a sentence.
tional (I-) phrases,1 which in turn are partitioned
into phonological (a0 -) phrases. (We ignore lower
levels of representation such as prosodic words and
syllables for the purposes of this paper.)
In their investigation of the syntax-prosody map-
ping, Nespor and Vogel (1986) define a0 -phrases as
consisting of a lexical head (e.g., a verb, noun or
adjective) together with all the material on its non-
recursive side up until the next head.2 In the ex-
1Intonational phrases are phonologically defined as units
which are associated with a characteristic intonational contour;
in particular, an I-phrase is marked by the presence of a major
pitch accent. The boundary of an I-phrase is canonically man-
ifested as a perceptible pause, accompanied by a local fall or
rise in Fa1 (fundamental frequency); it can also be marked by
constituent-final syllable lengthening, and stronger articulation
of constituent-initial consonants.
2Here, ‘nonrecursive’ is intended to cover modifiers and de-
ample of Figures 1 and 2 tease, little, girl, like, big
and dogs are lexical heads. These heads—barring
the adjectives—are bundled with the material to
their left. The adjectives are included in the same
a0 -phrases as the nouns they modify because they
are still inside the maximal projection (NP) of the
nouns.
The level of a0 -phrases can fairly easily be de-
rived from syntax. However, the same is not true
of I-phrases. According to the strict layer hypothe-
sis (Selkirk, 1984), an intonational phrase must con-
sist of complete a0 -phrases. But syntax does not de-
termine how many a0 -phrases go to make up an I-
phrase. To illustrate this point, consider (1), dis-
cussed by Gee and Grosjean (1983), where ‘ a0 ’ is
used to indicate I-phrase boundaries. Both phras-
ings are acceptable.
(1) By making his plan known a0 he brought out a0
the objections of everyone. a0
(2) By making his plan known a0 he brought out the
objections of everyone. a0
Nevertheless, the a0 -structure provides a strong con-
straint on the location of breaks between I-phrases,
since an I-phrase can never interrupt a a0 -phrase.
Although a0 -structure has been used by others to
assign prosodic structure algorithmically (Gee and
Grosjean, 1983; Bachenko and Fitzpatrick, 1990),
there is no generally accepted method for bundling
a0 -phrases into I-phrases. The main consensus is
that I-phrases have “a more or less uniform ’aver-
age’ length" (Nespor and Vogel, 1986, p.194). In a
similar vein, Gee and Grosjean (1983) observe that
utterances tend to be split into two or three I-phrases
of roughly equal length.
Gee and Grosjean (1983) (and subsequently,
Bachenko and Fitzpatrick (1990)) construct I-
phrases by comparing the length of the prosodic
constituents on both the left-hand side and the right-
hand side of the utterance’s main verb (or the a0 -
phrase containing the verb), and grouping the verb
with the shorter neighbouring constituent. They
give little consideration to the grouping of con-
stituents which are not adjacent to the verb. This
limitation in their model seems innocuous when
dealing with the rather artificially ‘well-behaved’
set of sentences in their sample. (This 14 sentence
terminers as opposed to complements. It is also required that
the ‘next head’ referred to in the definition be outside the max-
imal projection of the head which forms the basis of the a1 -
phrase.
corpus, also used by Bachenko and Fitzpatrick, only
contains sentences of 11–13 words in length and
does not scale up to unrestricted text). However,
to be useful in a realistic TTS system our model
should robustly run with unrestricted text and not
rely – like Bachenko and Fitzpatrick’s model – on
a correct parser output. Consequently, we need to
adopt a different strategy.
2 The computational model
Our initial English model was developed within
the framework of the LT TTT tokenization toolkit
(Grover et al., 2000): this provides a modular and
configurable pipeline architecture in which various
components incrementally add XML markup to the
input text stream. More details of the implementa-
tion can be found in (Atterer, 2002). In principle the
algorithm consists of two main steps, each of which
in turn is broken down into two further steps:
Step 1 Assignment of a0 -phrases
1. Chunking
2. Restructuring of chunks to build a0 -phrases
Step 2 Bundling of a0 -phrases into intonational
phrases (“Insert Phrase Breaks”)
1. Insertion of breaks using punctuation
2. Insertion of further breaks using balanc-
ing and length constraints
The first important step is to identify a0 -phrases.
Although we require some syntactic markup as in-
put to constructing these, a full parse is not nec-
essary. Instead, we carry out a shallow parse us-
ing a chunker. For English, we use Abney’s Cass
chunker.3 Cass builds syntactic structure incremen-
tally starting with a level of simple chunks and then
building various levels of more complex phrases
above them. Phrases of each level are constructed
non-recursively out of constituents of the previous
level. For this work we only use the lowest level of
units such as nx(noun chunk) and vx(verb chunk),
as illustrated in (3).
(3) <nx>Their presence</nx> <vx>has
enriched</vx> <nx>this univer-
sity</nx> and <nx>this country</nx>,
and <nx>many</nx> <vx>will re-
turn</vx> <nx>home</nx> <inf>to en-
hance</inf><nx>their own nations</nx>.
3Cass is available at http://www.research.att.
com/~abney/
Abney’s defininition of chunk is very similar to
Nespor and Vogel’s notion of a0 -phrase: “roughly
speaking, a chunk is the non-recursive core of an
intra-clausal constituent, extending from the begin-
ning of the constituent to its head, but not includ-
ing post-head constituents." (Abney, 1996). Chunks
defined in this way map almost directly into our a0 -
phrases, except that we also include in the a0 -phrase
any unchunked material on the left boundary of the
chunk. For example, the sequence and <nx> this
country</nx> in (3) is converted into a single a0 -
phrase.
For the German version of the model, we used
a chunker developed by Helmut Schmid (work in
progress) and carried out some subsequent restruc-
turing of the chunker’s output. The four main mod-
ifications to the chunk structure are as follows.
1 In German, as opposed to English, the auxil-
iary can be separated from the verb/verb group it
belongs to. That is, a complement or modifier can
split the verb chunk, and consequently the chunker
builds two separate verb chunks. Since the auxiliary
does not count as a lexical head, we delete the chunk
boundary after it. This is illustrated by examples (4)
and (5) where the deletion of the chunk boundary af-
ter the auxiliary hat results in the a0 -phrase hat den
Führungsstreit.
(4) <nx> Der nordrhein-westfälische Minister-
präsident </nx> <nx> Rau </nx> <vx>
hat </vx> <nx>den Führungsstreit </nx>
<px> bei <nx> den Sozialdemokraten
</nx></px><vx>kritisiert </vx>. <nx>
(5) <phi> Der nordrhein-westfälische Min-
isterpräsident Rau </phi>< phi> hat
den Führungsstreit </phi><phi> bei den
Sozialdemokraten kritisiert. </phi>
2 Proper names, which are often output as sepa-
rate chunks by the chunker, are attached to a pre-
ceding noun. In (5) the name Rau has been attached
to the preceding noun chunk of (4).
3 Verb particles at the end of sentences are at-
tached to the preceding chunk. Such verb particles
are in fact part of verbs, but are sometimes sepa-
rated from the verb stem, e.g. the particle auf from
the verb aufgeben (to give up) in the sentence Er gab
seinen Plan auf. (Lit: He gave his plan up.) In ex-
ample (7) the particle ab is attached to the preceding
chunk of (6).
(6) <nx> Die weitere Entwicklung </nx><px>
in <nx> den kommenden Jahren </nx>
</px><vx> hänge </vx><px> von <nx>
den unternehmerischen Qualitäten </nx>
</px><vx> ab </vx>.
(7) <phi> Die weitere Entwicklung
</phi><phi> in den kommenden Jahren
</phi><phi> hänge </phi><phi>
von den unternehmerischen Qualitäten ab .
</phi>
4 Phrase-final verb chunks which consist of only
one word are also attached to the preceding mate-
rial. This is also illustrated by (4) and (5) where the
final verb chunk consisting only of the past partici-
ple kritisiert is included in the same a0 -phrase as the
preceding chunk.
After identifying break-options in the form of a0 -
phrases, we have to bundle these constituents into
intonational phrases. As mentioned before, there
is observational evidence that utterances should be
divided into intonational phrases of roughly equal
length. Examining the Spoken English Corpus
(SEC), Knowles et al. (1996a, p.111) found that
speakers insert breaks after about five syllables in
most of the cases and that they almost never utter
more than 15 syllables without a break.
Our algorithm will thus contain a threshold pa-
rameter which sets an upper bound on the length
of I-phrases. This value is used to calculate the
optimum length of the I-phrases for particular sen-
tences. Even though the threshold sets an upper
bound, it is not a rigid one: an I-phrase can become
longer in some cases. This is similar to cases in
which a speaker would like to pause and maybe take
a breath, but has to utter a few more words in order
to complete a chunk.
As we mentioned before, we envisage our sys-
tem as forming one component of a TTS system,
and therefore it is reasonable to expect punctu-
ation in the input. This information provides a
hard initial constraint on the formation of I-phrases;
commas and periods always correspond to I-phrase
boundaries. Once we have identified these I-phrase
boundaries, the resulting segments are further sub-
divided by applying the following procedure.
Insert Phrase Breaks
If the number of syllables nsin an intonational
phrase is greater than threshold th, then
(a) Calculate the number of desired breaks
db = ns/th and the optimum length
olof each new intonational phrase ol =
ns/(db + 1).
(b) Determine the location of each new break
starting at the beginning of an intonational
phrase, counting ol syllables forward,
and carrying on until the end of the cur-
rent a0 -phrase. This is performed dbtimes
for the obligatory intonational phrase.
So a threshold of 13, for instance, turns the struc-
ture shown in example (4) into the one shown in (8)
where breaks are marked by ‘ a0 ’ and turns the struc-
ture in example (5) into the one shown in example
(9).
(8) Their presence has enriched this university a0
and this country, a0 and many will return home
a0 to enhance their own nations. a0
(9) Der nordrhein-westfälische Ministerpräsident
Rau a0 hat den Führungsstreit bei den
Sozialdemokraten kritisiert. a0
We tried modifying the last step such that the al-
gorithm could return to the beginning of the current
a0 -phrase if this was closer than the end. It is inter-
esting that this obtained slightly worse results, since
we believe that the current algorithm is closer to
what humans seem to do: reading on until they feel
that a break is necessary but not inserting a break
until they have completed the current a0 -phrase.
3 Evaluation Results
We have already alluded to the fact that often there
are several equally acceptable possibilities for as-
signing prosodic structure to a given stretch of
text. Consequently, the very notion of evaluating a
phrase-break model against a gold standard is prob-
lematic as long as the gold standard only represents
one out of the space of all acceptable phrasings.
Nevertheless, we have adopted the standard evalua-
tion methodology in the absence of a more suitable
alternative.
The English model was evaluated using a test cor-
pus of 8,605 words taken from the Spoken English
Corpus (SEC) (Knowles et al., 1996b).4 Our test
corpus comprises 6 randomly selected texts from 6
4The SEC is available from http://www.hd.uib.no/
icame/lanspeks.htmland consists of approximately 52k
words of contemporary spoken British English drawn from var-
ious genres. The material is available in orthographic and
prosodic transcription (including two levels of phrase breaks)
and in two versions with grammatical tagging.
different genres. We calculated recall and precision
values. Recall is the percentage of breaks in the cor-
pus that our model finds: recall a0 a1a3a2a5a4a1 a6a8a7a10a9a11a9a13a12
where B is the total number of breaks in the test cor-
pus and D is the number of deletion errors (breaks
which the model does not assign, even though they
are in the test corpus). Precision is the percentage
of breaks assigned by the model which is correct
according to the corpus: precision a0a15a14 a2a17a16
a14
a6a18a7a10a9a11a9a13a12
where S is the total number of breaks which our
model assigns to the corpus and I is the number of
insertion errors (breaks that the model assigns even
though no break occurs in the test corpus). We also
calculated the F-score:
a19
a0
a20
a6a22a21a24a23a26a25a28a27a10a29a31a30a26a29a31a32a34a33a35a6a35a36a37a25a28a27a10a38a11a39a31a39
a21a40a23a26a25a28a27a10a29a31a30a41a29a42a32a34a33a44a43a18a36a37a25a28a27a10a38a11a39a45a39
a6a46a7a10a9a11a9a13a12
The results for running the English version of the
model with selected thresholds are shown in Ta-
ble 1. Increasing the threshold decreases the number
Recall Precision F-score
th = 4 83 59 69
th = 6 75 66 70
th = 7 73 69 71
th = 8 70 70 70
th = 13 62 79 69
punctuation only 50 92 65
Taylor & Black 79 72 75
Table 1: Results on SEC Corpus
of breaks that the model assigns: recall goes down,
and precision goes up. Decreasing the threshold re-
sults in more overgeneration, with recall going up
and precision going down. A threshold of 7 pro-
duced the best overall results. Reducing or increas-
ing the threshold below 5 or above 12 results in an
overall F-score of below 70. However this is not true
for certain individual texts. One of the 6 texts we ex-
amined was the transcription of a public speech and
thus presumably delivered in a different way than
news broadcast for instance. (Example 8 was taken
from this speech). Its F-score for a threshold of 13
was 71 while its F-score for a threshold of 7 was
only 68. Section 4 below contains further discus-
sion of the role played by the threshold parameter
in modelling performance.
For comparison, the table also shows the results
of two other approaches, namely a baseline model
which we ran on our test data and which only as-
signs breaks at punctuation marks, and Taylor and
Black (1998)’s Markov model for English.5 It
should be mentioned that Taylor and Black’s model
was trained on the SEC corpus, part of which is used
for the evaluation here. It is thus optimized for this
corpus and has the disadvantage of being less gen-
eral than our model. Taylor and Black (1998, p.15)
report that recall dropped from 79% to 73% when
their model was tested on non-SEC data.
Recall Precision F-score
th = 13 93 96 94
Bachenko & Fitzp. 86 89 87
Table 2: Results on Gee & Grosjean Corpus
Table 2 gives the results of running the sys-
tem against the more homogeneous corpus (14 sen-
tences) of Gee and Grosjean, when restricted to
predicting major breaks (intra-sentential and inter-
sentential). For comparison, we also show the re-
sults reported by Bachenko and Fitzpatrick (1990)
from running their rule-based model on the same
corpus.6
The German version of the model was evaluated
using 7,409 words of the news corpus of the Insti-
tute of Natural Language Processing (IMS), Uni-
versity of Stuttgart (Rapp, 1998). News broadcasts
read by various speakers were hand-labelled with
two levels of breaks (Mayer, 1995). For the evalu-
ation we used all breaks without distinguishing be-
tween different levels. The results are shown in Ta-
ble 3. As a comparison, we also show the baseline
results using punctuation only, and results achieved
by Schweitzer and Haase (2000) using rule-based
approaches for German. The first set of results
by Schweitzer and Haase were obtained with a ro-
bust stochastic parser and a head-lexicalized proba-
bilistic context-free grammar, and the second set by
5Precision was calculated from the figures in Table 2 on p.
10 in their paper, assuming 1,404 breaks and 7,662 junctures as
stated on p. 4 there.
6These were calculated from the annotated sentences
in their appendix counting major intra-sentential and inter-
sentential breaks. Sentences with parsing errors were treated
as if no break had been assigned. A relatively high threshold
was picked because we only tried to account for major breaks,
and thus lower thresholds would cause too many insertion er-
rors.
Recall Precision F-score
th = 4 90 62 73
th = 5 86 66 75
th = 6 84 69 76
th = 7 80 71 75
th = 10 73 75 74
punctuation only 49 93 64
Schweitzer & Haase 1 86 66 75
Schweitzer & Haase 2 71 82 76
Table 3: Results on IMS Corpus
mapping from tag-sequences.
4 Accounting for prosodic breaks at
various speech rates
When speakers talk faster they use fewer breaks per
utterance, and when they talk more slowly they use
more breaks (Trouvain and Grice, 1999). This is
reminiscent of what our model does when we in-
crease and decrease the threshold parameter respec-
tively. Intuitively, the algorithm was often able to
predict acceptable break patterns for various thresh-
old parameters. The variation in threshold seemed
to reflect what speakers would do when varying
their speech rate.
In order to capture this effect in a more formal
way we tried to evaluate the algorithm on a corpus
which was recorded at three different speech rates
(Trouvain and Grice, 1999). Three speakers (CZ,
PS and AT) read a German text of 108 words 3 times
slowly, 3 times at a normal rate, and 3 times at a fast
rate.
Trouvain and Grice show that reduc-
ing/increasing breaks is not the only prosodic
correlate of changing speech rate; for example,
speakers also reduce phone durations or pause du-
rations. The extent to which increasing/decreasing
the number of breaks correlates with speech rate
varies both within and across speakers. One of
the speakers, for instance, uses 23 breaks in her
first slow version, 28 in her second slow version,
and 26 in her third slow version. On average this
was definitely more than she used in her normal
versions (20, 20 and 24 respectively). To test our
algorithm we only used the slow version with the
largest number of breaks, the fast version with the
smallest number of breaks, and one of the normal
versions which was closest to the average of the
normal versions. We did this for each speaker.
We expected to see an effect of the slower ver-
sion being better modelled by low threshold param-
eters, and the fast versions by higher parameters.
It turned out, however, that the slow versions pro-
duced much lower recall/precision values compared
to the faster versions. This was due to the fact the
when they produced their slow versions, the speak-
ers tended to insert breaks at positions which do not
correspond to our a0 -phrase boundaries, such as im-
mediately after sentence-initial temporal adverbial
phrases (which are not marked by commas in Ger-
man). We would have needed a tagger which dis-
tinguishes adverbials of time from other adverbials
to account for this. Moreover, further changes in
the rules for the restructuring of chunks might have
been appropriate, such as preventing breaks before
any phrase-final verb chunks up to a certain length.
This expedient needs to be approached carefully,
however, since when we are trying to model such
a small corpus, there is a danger of ‘overfitting’ the
rule set in a way which fails to generalize properly
to more extensive corpora.
For the time being, we decided to manually carry
out the first step of the algorithm, namely the as-
signment of a0 -phrases, in order to test whether the
heuristics are useful for modelling different speech
rates. We assigned a final a0 -phrase boundary to
all those structural locations where we could find a
phrase break in more than one of our 27 spoken ver-
sions of the text. This resulted in a structure which
could in theory be found automatically if the neces-
sary information was available (e.g. explicitly anno-
tating adverbs of time).
Running the heuristics on this a0 -structure did
indeed show some potential for imitating various
speech rates. Table 4 shows recall/precision pairs
for running the algorithm with the range of possi-
ble threshold values on a slow, normal and fast ver-
sion by speaker CZ. The grey shading in the table
shows the best values, i.e. where recall is greater
than 90.0% and precision is greater than 80.0.%7 It
does indeed appear that higher thresholds lead to a
better model of fast speech rates, and lower thresh-
olds are more appropriate for slow speech rates. The
7The model has a general tendency to assign higher recall
than precision values. Therefore we have to weigh precision
a little bit lower than recall (approximately in a ratio of 8:9)
to see the effect. For better readability we leave out the F-
scores, which also would only show the effect if weights were
included.
threshold slow normal fast
1–3 100.0/82.8 100.0/65.5 100.0/51.7
4 91.7/84.6 100.0/73.1 100.0/57.7
5–7 91.7/88.0 100.0/76.0 100.0/60.0
8 87.5/95.5 100.0/86.4 100.0/68.2
9–10 83.3/90.9 94.7/81.8 100.0/68.2
11–14 79.2/90.5 94.7/85.7 100.0/71.4
15–17 75.0/90.0 94.7/90.0 100.0/75.0
18–21 75.0/100.0 89.5/94.4 100.0/83.3
22–a0 70.8/100.0 84.2/94.1 100.0/88.2
Table 4: Recall/precision values for one slow, one
normal, and one fast version of a text read by
speaker CZ.
threshold slow normal fast
1-3 92.9/89.7 100.0/65.5 100.0/58.6
4 82.1/88.5 94.7/69.2 100.0/65.4
5-7 82.1/92.0 94.7/72.0 100.0/68.0
8 75.0/95.5 94.7/81.8 100.0/77.3
9-10 75.0/95.5 94.7/81.8 100.0/77.3
11-14 71.4/95.2 94.7/85.7 100.0/81.0
15-17 71.4/100.0 94.7/90.0 100.0/85.0
18-21 64.3/100.0 94.7/100.0 100.0/94.4
22-a0 60.7/100.0 89.5/100.0 94.1/94.1
Table 5: Like Table 4 but for speaker PS.
threshold slow normal fast
1-3 100.0/72.4 100.0/65.5 100.0/58.6
4 100.0/80.8 100.0/73.1 100.0/65.4
5-7 95.2/80.0 100.0/76.0 100.0/68.0
8 90.5/86.4 94.7/81.8 100.0/77.3
9-10 85.7/81.8 94.7/81.8 100.0/77.3
11-14 85.7/85.7 94.7/85.7 100.0/81.0
15-17 85.7/90.0 94.7/90.0 100.0/85.0
18-21 85.7/100.0 94.7/100.0 100.0/94.4
22-a0 81.0/100.0 89.5/100.0 100.0/100.0
Table 6: Like Table 4 but for speaker AT.
tables for the other two speakers (Table 5 and Table
6) show the same tendency. They also reflect the
tendency of those two speakers to use the strategy
of varying the number of breaks to a lesser extent
than CZ when speeding up (cf. Trouvain and Grice
(1999)).
5 Discussion
Our heuristic can imitate the phrasing of various
speech rates. This can be achieved by modifying
a threshold parameter. Slow speech rate is imitated
by decreasing, and fast rate by increasing this single
parameter.
However, the results are not quite satisfactory
yet, because some of the steps of the overall pro-
cedure for assigning phrase breaks were manually
corrected. It would be necessary to implement these
additional changes in the chunker rules, and exam-
ine whether they enhance or decrease the overall
performance. The latter might be the case if they
are too genre specific.
As we noted earlier, a more general problem is
that larger text corpora for the evaluation of dif-
ferent speech rates are not available. Another ap-
proach, which we would like to explore in future
work, would be to feed the output of the model into
a TTS system and measure human judgements of
acceptability.
6 Conclusion
We proposed a model that uses linguistic constraints
and a heuristic to assign phrase breaks to unre-
stricted text. The model does not need any training.
This is useful because training corpora marked with
intonational phrases are sparse, especially as far as
languages other than English are concerned. We
show that the model is adaptable to other languages.
Its performance is comparable to other phrase break
models, and there is still some leeway for improve-
ment. We tested how far a heuristic which is part of
the model is capable of capturing changes in speech
rate and gained promising results. This is significant
given the increasing interest in non-linear modelling
of speech rate within the speech synthesis commu-
nity.
Acknowledgements
We are grateful to Jürgen Trouvain for kindly mak-
ing his corpus available to us, and to three anony-
mous reviewers for their comments.

References

Steven Abney. 1996. Chunk stylebook. Avail-
able from http://www.research.att.
com/~abney/publications.html.

Michaela Atterer. 2002. Assigning prosodic struc-
ture for speech synthesis: a rule-based approach.
In Proc. of the Speech Prosody 2002 Conference,
Aix-en-Provence.

Joan Bachenko and Eileen Fitzpatrick. 1990.
A computational grammar of discourse-neutral
prosodic phrasing in english. Computational Lin-
guistics, 16(3):155–170.

James P. Gee and François Grosjean. 1983. Perfor-
mance structures: A psycholinguistic and linguis-
tic appraisal. Cognitive Psychology, 15:411–458.

Claire Grover, Colin Matheson, Andrei Mikheev,
and Marc Moens. 2000. LT TTT – a flexible to-
kenization tool. In Proceedings of Second Inter-
national Conference on Language Resources and
Evaluation (LREC 2000), pages 1147–1154.

Gerry Knowles, Anne Wichmann, and Peter Alder-
son, editors. 1996a. Working with Speech: Per-
spectives on Research into the Lancaster/IBM
Spoken English Corpus. Longman, London.

Gerry Knowles, Briony Williams, and Lita Taylor,
editors. 1996b. A Corpus of Formal British En-
glish Speech: The Lancaster/IBM Spoken English
Corpus. Longman, London.

Jörg Mayer. 1995. Transcription of german intona-
tion – the stuttgart system. Technical report, Uni-
versity of Stuttgart.

Marina Nespor and Irene Vogel. 1986. Prosodic
Phonology. Number 28 in Studies in Generative
Grammar. Foris Publications, Dordrecht.

Stefan Rapp. 1998. Automatisierte Erstellung von
Korpora für die Prosodieforschung. Ph.D. thesis,
IMS, University of Stuttgart.

Antje Schweitzer and Martin Haase. 2000. Zwei
ansätze zur syntaxgesteuerten prosodiegener-
ierung. In Tagungsband der KONVENS 2000 -
Sprachkommunikation, Berlin. VDE-Verlag.

Elisabeth Selkirk. 1984. Phonology and Syntax.
The relation between sound and structure. MIT
Press, Cambridge, Mass.

Paul Taylor and Alan W. Black. 1998. Assign-
ing phrase breaks from part-of-speech sequences.
Computer Speech and Language, 12:99–117.

Jürgen Trouvain and Martine Grice. 1999. The ef-
fect of tempo on prosodic structure. In Proc. 14th
Intern. Confer. Phonetic Sciences, San Francisco.
