Comma Restoration Using Constituency Information
Stuart M. Shieber
Harvard University
shieber@deas.harvard.edu
Xiaopeng Tao
Harvard University
xptao@deas.harvard.edu
Abstract
Automatic restoration of punctuation from un-
punctuated text has application in improving
the fluency and applicability of speech recog-
nition systems. We explore the possibility that
syntactic information can be used to improve
the performance of an HMM-based system for
restoring punctuation (specifically, commas) in
text. Our best methods reduce sentence error
rate substantially — by some 20%, with an ad-
ditional 8% reduction possible given improve-
ments in extraction of the requisite syntactic in-
formation.
1 Motivation
The move from isolated word to connected speech recog-
nition engendered a qualitative improvement in the nat-
uralness of users’ interactions with speech transcription
systems, sufficient even to make up in user satisfaction
for some modest increase in error rate. Nonetheless, such
systems still retain an important source of unnaturalness
in dictation, the requirement to utter all punctuation ex-
plicitly. In order to free the user from this burden, a tran-
scription system would have to reconstruct the punctua-
tion from the word sequence. For certain applications —
for instance, transcription of naturally occurring speech
not originally targeted to a speech recognizer (as broad-
cast audio) — there is no alternative to performing recon-
struction of punctuation.
Reconstruction of different punctuation marks is likely
to respond to different techniques. Reconstruction of pe-
riods, question marks, and exclamation marks, for in-
stance, is in large part the problem of sentence bound-
ary detection. In this paper, we address the problem of
comma restoration. The published literature on intrasen-
tence punctuation restoration is quite limited, the state of
the art represented by Beeferman, Berger, and Lafferty’s
CYBERPUNC system, which we review in Section 2, and
reimplement as a baseline for our own experiments. (See
Section 5 for discussion of related work.)
The CYBERPUNC system uses a simple HMM with tri-
gram probabilities to model the comma restoration prob-
lem. It is trained on fully punctuated text, and then
tested for precision and recall in reconstructing commas
in text that has had them removed. Our replication of the
trigram-based method yields a sentence accuracy of 47%.
However, the role of the comma in text is closely
related to syntactic constituency. Nunberg (1990) de-
scribes two main classes of comma: the delimiter comma,
which is used to mark off a constituent, and the sepa-
rator comma, which is inserted between conjoined ele-
ments with or without a conjunction. In both cases, one
expects to see commas at the beginning or end of con-
stituents, rather than in the middle. But this type of cor-
relation is difficult to model with a flat model such as an
HMM. For this reason, we explore here the use of syn-
tactic constituency information for the purpose of comma
restoration. We show that even very rarefied amounts of
syntactic information can dramatically improve comma
restoration performance; our best method accurately re-
stores 58% of sentences. Furthermore, even approximate
syntactic information provides significant improvement.
There is, of course, great variation in appropriate punc-
tuation of a single word stream.1 For this reason, inde-
pendent human annotators consider only about 86% of
the sentences in the test set to be correct with respect
to comma placement (Beeferman et al., 1998). Thus, a
move from 47% to 58% is a quite substantial improve-
ment, essentially a reduction in sentence error rate of
some 30%.
1In an old unattributed joke, an English professor asks some
students to punctuate the word sequence “Woman without her
man is nothing”. The male students preferred “Woman, without
her man, is nothing.” whereas the female proposed “Woman!
Without her, man is nothing.” No, it’s not funny, but it does
make the point.
                                                               Edmonton, May-June 2003
                                                             Main Papers , pp. 142-148
                                                         Proceedings of HLT-NAACL 2003
Digression: What’s Statistical Parsing Good For?
There has been a tremendous amount of research since
the early 1990’s on the problem of parsing using statisti-
cal models and evaluated by statistical measures such as
crossing brackets rate. Statistical parsing, like language
modeling, is presumably of interest not in and of itself but
rather by virtue of its contribution to some end-user appli-
cation. In the case of language modeling, speech recog-
nition is the leading exemplar among a large set of end-
user natural-language-processing applications that bene-
fit from the technology. Further, the statistical figures of
merit are appropriate just insofar as they vary more or less
continuously and monotonically with the performance of
the end-user application. Again, in the case of language
modeling, speech recognition error rate is generally ac-
knowledged to improve in direct relation to reduction in
cross-entropy of the language model employed.
For statistical parsing, it is much more difficult to say
what applications actually benefit from this component
technology in the sense that incremental improvements
to the technology as measured by the statistical figures
of merit provide incremental benefit to the application.
The leading argument for parsing a sentence is that this
establishes the structure upon which semantic interpreta-
tion can be performed. But it is hard to imagine in what
sense an 85% correct parse is better than an 80% correct
parse, as the semantic interpretation generated off of each
is likely to be wrong. Barring a sensible notion of “par-
tially correct interpretation” and an end-user application
in which a partially correct interpretation is partially as
good as a fully correct one, we would not expect statisti-
cal parsers to be useful for end user applications based on
sentence interpretation.2 In fact, to the authors’ knowl-
edge, the comma restoration results presented here are the
first instance of an end-user application that bears on its
face this crucial property, that incremental improvement
on statistical parsing provides incremental improvement
in the application.
2 The Trigram Baseline
As a baseline, we replicated the three-state HMM method
of Beeferman et al. (1998). In this section, we describe
that method, which we use as the basis for our extensions.
The input to comma restoration is a sentence x =
x1 : : :xn of words and punctuation but no commas. We
would like to generate a restored string y = y1 : : : yn+c,
which is the string x with c commas inserted. The se-
lected y should maximize conformance with a simple tri-
2We might expect nonstatistical parsers also not to be use-
ful, but for a different reason, their fragility. Rather than de-
livering partially correct results, they partially deliver correct
results. But that is a different issue.
w?w,
w,w_
w_w_
xi ,
xi ,
xi ,xi
xi
xi
p(xi | xi-2 xi-1)
p(xi | , xi-1)
p(xi | xi-2 xi-1) p(, | xi-1 xi)
p(xi | xi-1 ,) p(, | , xi)
p(xi | , xi-1) p(, | xi-1 xi)
p(xi | xi-1 ,)
Figure 1: Three-state HMM for decoding a comma-
reduced string x1    xn to its comma-restored form.
Transitions are labeled with their position-dependent
probabilities.
gram model:
y = argmaxy
n+cY
i=1
p(yi j yi 2yi 1)
We take the string x to be the observed output of an
HMM with three states and transition probabilities de-
pendent on output; the states encode the position of com-
mas in a reconstructed string. Figure 1 depicts the au-
tomaton. The start state (1) corresponds to having seen a
word with no prior or following comma, state (2) a word
with a following comma, and state (3) a word with a prior
but no following comma. It is easy to see that a path
through the automaton traverses a string y with probabil-
ity Qn+ci=1 p(yi j yi 2yi 1). The decoded string y can
therefore be computed by Viterbi decoding.3
This method requires a trigram language model p().
We train this language model on sections 02–22 of the
Penn Treebank Wall Street Journal data (WSJ)4, com-
prising about 36,000 sentences. The CMU Statistical
Language Modeling Toolkit (Clarkson and Rosenfeld,
1997) was used to generate the model. Katz smoothing
was used to incorporate lower-order models. The model
3As it turns out, the same computation can be done using a
two-state model. This automaton does not, however, lend itself
as easily to extensions.
4For consistency, we use the version of the Wall Street Jour-
nal data that was used by Beeferman et al. (1998) for their CY-
BERPUNC experiments. This comprises sections 02–23 of the
Wall Street Journal (the last of these being used as test data)
with minor variations from the Treebank version, for instance,
a small number of missing sentences and some variation in the
tags. Runs of the experiments below using the Treebank ver-
sions of the data yield essentially identical results.
was then tested on the approximately 2300 sentences of
WSJ Section 23. Precision of the comma restoration
was 71.1% and recall 55.2%. F-measure, calculated as
2PR=(P + R), where P is precision and R recall, is
62.2%. Overall 96.3% of all comma placement decisions
were made correctly, a metric we refer to as token accu-
racy. Sentence accuracy, the percentage of sentences cor-
rectly restored, was 47.0%. (These results are presented
as model 1 in Table 1.) This is the baseline against which
we evaluate our alternative comma restoration models.
Beeferman et al. present an alternative trigram model,
which computes the following:
y = argmaxy
n+cY
i=1
p(yi j yi 2yi 1)
(1  p(;j yi 2yi 1)) (yi)
where
 (yi) =
 0 y
i =;
1 otherwise
That is, an additional penalty is assessed for not placing
a comma at a given position. By penalizing omission of
a comma between two words, the model implicitly re-
wards commas; we would therefore expect higher recall
and correspondingly lower precision.5 In fact, the method
with the omission penalty (model 2 in Table 1), does have
higher recall and lower precision, essentially identical F-
measure, but lower sentence accuracy. Henceforth, the
models described here do not use an omission penalty.
3 Commas and Constituency
Insofar as commas are used as separators or delimiters,
we should see correlation of comma position with con-
stituent structure of sentences. A simple test reveals that
this is so. We define the start count sci of a string posi-
tion i as the number of constituents whose left boundary
is at i. The end count eci is defined analogously. For ex-
ample, in Figure 2, sc0 is 4, as the constituents labeled
JJ, NPB, S, and S start there; ec0 is 0. We compute the
end count for positions that have a comma by first drop-
ping the comma from the tree. Thus, at position 5, sc5
is 2 (constituents DT, NPB) and ec5 is 4 (constituents JJ,
ADJP, VP, S).
We expect to find that the distributions of sc and ec
for positions in which a comma is inserted should differ
from those in which no comma appears. Figure 3 reveals
that this intuition is correct. The charts show the per-
centage of string positions with each possible value of
sc (resp. ec) for those positions with commas and those
5Counterintuitively, Beeferman et al. (1998) come to the
opposite expectation, and their reported results bear out their
intuition. We have no explanation for this disparity with our
results.
S
S
NPB VP
JJ NN NNS VBP JJ PUNC, DT NN VBD PUNC.
Further   staff   cuts     are   likely       ,       the   spokesman   indicated     .
0              1        2         3      4                5            6                   7                8      9
ADJP NPB
VP
Figure 2: Sample tree, showing computation of sc and
ec values. The four constituents leading to ec5 = 4 are
shown circled, and the two leading to sc5 = 2 are shown
circled and shaded.
without. We draw the data again from sections 02–22 of
the Wall Street Journal, using as the specification for the
constituency of sentences the parses for these sentences
from the Penn Treebank. The distributions are quite dif-
ferent, hinting at an opportunity for improved comma
restoration.
The ec distribution is especially well differentiated,
with a cross-over point at about 2 constituents. We can
add this kind of information, a single bit specifying an ec
value of k or greater (call it ceci), to the language model,
as follows. We replace p(yi j yi 2yi 1) with the proba-
bility p(yi j yi 2yi 1 ceci). We smooth the model using
lower order models p(yi j yi 1 ceci), p(yi j ceci), p(yi).6
These distributions can be estimated from the training
data directly, and smoothed appropriately.
Adding just this one bit of information provides signifi-
cant improvement to comma restoration performance. As
it turns out, a k value of 3 turns out to maximize perfor-
mance.7 Compared to the baseline, F-measure increases
to 63.2% and sentence accuracy to 52.3%. This exper-
iment shows that constituency information, even in rar-
efied form, can provide significant performance improve-
ment in comma restoration. (Figure 1 lists performance
figures as model 3.)
Of course, this result does not provide a practical al-
gorithm for comma restoration, as it is based on a prob-
abilistic model that requires data from a manually con-
structed parse for the sentence to be restored. To make the
method practical, we might replace the Treebank parse
with a statistically generated parse. In the sequel, we use
Collins’s statistical parser (Collins, 1997) as our canon-
ical automated approximation of the Treebank. We can
train a similar model, but using ec values extracted from
6Alternative backoff paths, for instance backing off first to
p(yi j yi 2yi 1), exhibit inferior performance.
7With k = 2 (model 4), precision drops precipitously to
60.4%, recall stays roughly the same at 66.4%.
Inf
o
sour
ces
T
raining
T
esting
trigram
insertionpenalty
w ordclass
ec,threshold=2
ec,threshold=3
ec,nothreshold
stemmer
T reebank
Collinsparse,commas
Collinsparse,nocommas
T reebank
Collinsparse,commas
Collinsparse,nocommas
Modelnumber
precision
recall
F-measure
tok enaccurac y
sentenceaccurac y
reductioninsentence”error”
 
1
.711
.552
.621
.963
.470
.000
 
 
2
.684
.576
.625
.962
.457
-.033
 
 
 
 
.834
.511
.634
.967
.514
.113
 
 
 
 
.709
.559
.625
.963
.464
-.014
 
 
 
 
3
.752
.627
.683
.968
.523
.135
 
 
 
 
4
.604
.664
.633
.958
.435
-.091
 
 
 
 
 
.863
.563
.681
.971
.555
.217
 
 
 
 
8
.671
.780
.721
.967
.508
.097
 
 
 
 
 
10
.796
.704
.748
.974
.579
.280
 
 
 
 
 
 
11
.791
.711
.749
.974
.576
.271
 
 
 
 
5
.714
.588
.645
.964
.489
.049
 
 
 
 
.733
.557
.633
.964
.489
.048
 
 
 
 
 
.851
.532
.655
.969
.534
.164
 
 
 
 
9
.657
.674
.666
.963
.476
.015
 
 
 
 
 
12
.797
.626
.701
.970
.549
.204
 
 
 
 
 
 
.791
.631
.702
.970
.544
.190
 
 
 
 
6
.714
.581
.641
.964
.486
.041
 
 
 
 
7
.738
.609
.668
.967
.507
.095
 
 
 
 
.746
.602
.666
.967
.503
.085
 
 
 
 
 
.859
.556
.675
.970
.550
.205
 
 
 
 
.681
.728
.704
.966
.501
.080
 
 
 
 
 
.811
.672
.735
.973
.571
.260
 
 
 
 
 
 
.805
.677
.735
.973
.570
.256
Table 1: Performance of the various comma restoration models described in this paper.
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4 5 6 7 8
without comma
with comma
(a)
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0 1 2 3 4 5 6 7 8 9 10 11 12 13
without comma
with comma
(b)
Figure 3: Differential pattern of constituent starts and
ends for string positions with and without commas. Chart
(a) shows the percentage of constituents with various val-
ues of sc (number of constituents starting at the posi-
tion) for string positions with commas (square points)
and without (diamond points). Chart (b) shows the corre-
sponding pattern for values of ec (number of constituents
ending).
Collins parses of the training data, and use the model to
restore commas on a test sentence again using ec values
from the Collins parse of the test datum. This model,
listed as model 5 in Table 1, has an F-measure of 64.5%,
better than the pure trigram model (62.2%), but not as
good as the oracular Treebank-trained model (68.4%).
The other metrics show similar relative orderings.
In this model, since the test sentence has no commas
initially, we want to train the model on the parses of sen-
tences that have had commas removed, so that the model
is being applied to data as similar as possible to that on
which it was trained. We would expect, and experiments
verify (model 6), that training on the parses with com-
mas retained yields inferior performance (in particular,
F-measure of 64.1% and sentence accuracy of 48.6%).
Again consistent with expectations, if we could clairvoy-
antly know the value of ceci based on a Collins parse of
the test sentence with the commas that we are trying to
restore (model 7), performance is improved over model
5; F-measure rises to 66.8%.
The steady rise in performance from models 6 to 5 to
7 to 3 exactly tracks the improved nature of the syntac-
tic information available to the system. As the quality
of the syntactic information better approximates ground
truth, our ability to restore commas gradually and mono-
tonically improves.
4 Using More Detailed Syntactic
Information
4.1 Using full end count information
The examples above show that even a tiny amount of syn-
tactic information can have a substantive advantage for
comma restoration. In order to use more information,
we might imagine using values of ec directly, rather than
thresholding. However, this quickly leads to data spar-
sity problems. To remedy this, we assume independence
between the bigram in the conditioning context and the
syntactic information, that is, we take
p(yi j yi 2yi 1eci)  
p(yi j yi 2yi 1)p(yi j yi 1eci)
p(yi)
This model8 (model 8) has an F-measure of 72.1% due
to a substantial increase in recall, demonstrating that the
increased articulation in the syntactic information avail-
able provides a concomitant benefit. Although the sen-
tence accuracy is slightly less than that with thresholded
ec, we will show in a later section that this model com-
bines well with other modifications to generate further
8We back off the first term in the approximation as before,
and the second to p(yi j yi 1).
improvements.9
4.2 Using part of speech
Additional information from the parse can be useful in
predicting comma location. In this section, we incorpo-
rate part of speech information into the model, generating
model 10. We estimate the joint probability of each word
xi and its part of speech Xi as follows:
p(xi; Xi j xi 2; Xi 2; xi 1; Xi 1; ec)  
p(xi j xi 2xi 1ec)p(Xi j Xi 2Xi 1)
The first term is computed as in model 8, the second back-
ing off to bigram and unigram models. Adding a part of
speech model in this way provides a further improvement
in performance. F-measure improves to 74.8%, sentence
accuracy to 57.9%, a 28% improvement over the base-
line.
These models (8 and 10), like model 3, assumed
availability of the Treebank parse and part of speech
tags. Using the Collins-parse-generated parses still shows
improvement over the corresponding model 5: an F-
measure of 70.1% and sentence accuracy of 54.9%, twice
the improvement over the baseline as exhibited by model
5.
5 Related Work
We compare our comma restoration methods to those of
Beeferman et al. (1998), as their results use only textual
information to predict punctuation. Several researchers
have shown prosodic information to be useful in predict-
ing punctuation (Christensen et al., 2001; Kim and Wood-
land, 2001) (along with related phenomena such as dis-
fluencies and overlapping speech (Shriberg et al., 2001)).
These studies, typically based on augmenting a Marko-
vian language model with duration or other prosodic cues
as conditioning features, show that prosody information
is orthogonal to language model information; combined
models outperform models based on each type of infor-
mation separately. We would expect therefore, that our
techniques would similarly benefit from the addition of
prosodic information.
In the introduction, we mentioned the problem of sen-
tence boundary detection, which is related to the punc-
tuation reconstruction problem especially with regard to
predicting sentence boundary punctuation such as peri-
ods, question marks, and exclamation marks. (This prob-
lem is distinct from the problem of sentence boundary
disambiguation, where punctuation is provided, but the
categorization of the punctuation as to whether or not
9An alternative method of resolving the data sparsity issues
is to back off the model p(yi j yi 2yi 1eci), for instance to
p(yi j yi 2yi 1) or to p(yi j yi 1eci). Both of these perform
less well than the approximation in model 8.
it marks a sentence boundary is at issue (Palmer and
Hearst, 1994; Reynar and Ratnaparkhi, 1997).) Stolcke
and Shriberg (1996) used HMMs for the related problem
of linguistic segmentation of text, where the segments
corresponded to sentences and other self-contained units
such as disfluencies and interjections. They argue that a
linguistic segmentation is useful for improving the per-
formance and utility of language models and speech rec-
ognizers. Like the present work, they segment clean text
rather than automatically transcribed speech. Stevenson
and Gaizauskas (Stevenson and Gaizauskas, 2000) and
Goto and Renals (Gotoh and Renals, 2000) address the
sentence boundary detection problem directly, again us-
ing lexical and, in the latter, prosodic cues.
6 Future Work and Conclusion
The experiments reported here — like much of the previ-
ous work in comma restoration (Beeferman et al., 1998)
and sentence boundary disambiguation and restoration
(Stolcke and Shriberg, 1996; Shriberg et al., 2001; Go-
toh and Renals, 2000; Stevenson and Gaizauskas, 2000)
(though not all (Christensen et al., 2001; Stolcke et al.,
1998; Kim and Woodland, 2001)) — assume an ideal ref-
erence transcription of the text. The performance of the
method on automatically transcribed speech with its con-
comitant error remains to be determined. A hopeful sign
is the work of Kim and Woodland (Kim and Woodland,
2001) on punctuation reconstruction using prosodic in-
formation. The performance of their system drops from
an F-measure of 78% on reference transcriptions to 44%
on automatically transcribed speech at a word error rate
of some 20%. Nonetheless, prosodic features were still
useful in improving the reconstructed punctuation even
in the automatically transcribed case.
The simple HMM model that we inherit from earlier
work dramatically limits the features of the parse that we
can easily appeal to in predicting comma locations. Many
alternatives suggest themselves to expand the options,
including maximum entropy models, which have been
previously successfully applied to, inter alia, sentence
boundary detection (Reynar and Ratnaparkhi, 1997), and
transformation-based learning, as used in part-of-speech
tagging and statistical parsing applications (Brill, 1995).
In addition, all of the methods above are essentially
nonhierarchical, based as they are on HMMs. An alter-
native approach would use the statistical parsing model
itself as a model of comma placement, that is, to select
the comma placement for a string such that the resulting
reconstructed string has maximum likelihood under the
statistical parsing model. This approach has the benefit
that the ramifications of comma placement on all aspects
of the syntactic structure are explored, but the disadvan-
tage that the longer distance lexical relationships found
in a trigram model are eliminated.
Nonetheless, even under these severe constraints and
using quite simple features distilled from the parse, we
can reduce sentence error by 20%, with the potential of
another 8% if statistical parsers were to approach Tree-
bank quality. As such, comma restoration may stand as
the first end-user application that benefits from statisti-
cal parsing technology smoothly and incrementally. Fi-
nally, our methods use features that are orthogonal to the
prosodic features that other researchers have explored.
They therefore have the potential to combine well with
prosodic methods to achieve further improvements.
Acknowledgments
Partial support for the work reported in this paper was
provided by the National Science Foundation under grant
number IRI 9712068.
We are indebted to Douglas Beeferman for making
available his expertise and large portions of the code and
data for replicating the CYBERPUNC experiments.
The first author would like to express his appreciation
to Ivan Sag and the Center for the Study of Language and
Information, Stanford, California and to Oliviero Stock
and the Centro per la Ricerca Scientifica e Tecnologica,
Trento, Italy, for space and support for this work during
spring and summer of 2002.
References
Doug Beeferman, Adam Berger, and John Lafferty.
1998. CYBERPUNC: A lightweight punctuation an-
notation system for speech. In Proceeding as of the
IEEE International Conference on Acoustics, Speech
and Signal Processing, pages 689–692, Seattle, WA.
Eric Brill. 1995. Transformation-based error-driven
learning and natural language processing: A case study
in part of speech tagging. Computational Linguistics,
21(4):543–565.
Heidi Christensen, Yoshihiko Gotoh, and Steve Renals.
2001. Punctuation annotation using statistical prosody
models. In Proceedings of the 2001 ISCA Tutorial and
Research Workshop on Prosody in Speech Recognition
and Understanding, Red Bank, NJ, October 22–24. In-
ternational Speech Communication Association.
Philip Clarkson and Ronald Rosenfeld. 1997. Statistical
language modeling using the CMU-Cambridge toolkit.
In Proceedings of Eurospeech ’97, pages 2707–2710,
Rhodes, Greece, 22–25 September.
Michael Collins. 1997. Three generative, lexicalised
models for statistical parsing. In Proceedings of the
35th Annual Meeting of the Association for Compu-
tational Linguistics and Eighth Conference of the Eu-
ropean Chapter of the Association for Computational
Linguistics, pages 16–23, Madrid, Spain, 7–11 July.
Yoshihiko Gotoh and Steve Renals. 2000. Sentence
boundary detection in broadcast speech transcripts.
In Proceedings of the ISCA Workshop on Automatic
Speech Recognition: Challenges for the New Millen-
nium (ASR-2000), Paris, France, 18–20 September. In-
ternational Speech Communication Association.
Ji-Hwan Kim and P. C. Woodland. 2001. The use of
prosody in a combined system for punctuation gener-
ation and speech recognition. In Proceedings of Eu-
rospeech ’01, pages 2757–2760, Aalborg, Denmark,
September 3–7.
Geoffrey Nunberg. 1990. The Linguistics of Punctua-
tion. CSLI Publications, Stanford, CA.
David D. Palmer and Marti A. Hearst. 1994. Adaptive
sentence boundary disambiguation. In Proceedings of
the Fourth ACL Conference on Applied Natural Lan-
guage Processing, pages 78–83, Stuttgart, Germany,
13–15 October. Morgan Kaufmann.
Jeffrey C. Reynar and Adwait Ratnaparkhi. 1997. A
maximum entropy approach to identifying sentence
boundaries. In Proceedings of the Fifth Conference on
Applied Natural Language Processing, pages 16–19,
Washington, DC, 31 March–3 April.
Elizabeth Shriberg, Andreas Stolcke, and Don Baron.
2001. Can prosody aid the automatic processing of
multi-party meetings? Evidence from predicting punc-
tuation, disfluencies, and overlapping speech. In Pro-
ceedings of the 2001 ISCA Tutorial and Research
Workshop on Prosody in Speech Recognition and Un-
derstanding, Red Bank, NJ, October 22–24. Interna-
tional Speech Communication Association.
Mark Stevenson and Robert Gaizauskas. 2000. Ex-
periments on sentence boundary detection. In Pro-
ceedings of the Sixth Conferernce on Applied Natu-
ral Language Processing and the First Conference of
the North American Chapter of the Association for
Computational Linguistics, pages 24–30, Seattle, WA,
April.
Andreas Stolcke and Elizabeth Shriberg. 1996. Au-
tomatic linguistic segmentation of conversational
speech. In H. T. Bunnell and W. Idsardi, editors, Pro-
ceedings of the International Conference on Spoken
Language Processing, volume 2, pages 1005–1008,
Philadelphia, PA, 3–6 October.
Andreas Stolcke, Elizabeth Shriberg, Rebecca Bates,
Mari Ostendorf, Dilek Hakkani, Madelaine Plauche,
G¨okhan T¨ur, and Yu Lu. 1998. Automatic detection of
sentence boundaries and disfluencies based on recog-
nized words. In Proceedings of the International Con-
ference on Spoken Language Processing, volume 5,
pages 2247–2250, Sydney, Australia, 30 November–4
December.
