INTEGRATING MULTIPLE KNOWLEDGE SOURCES FOR 
DETECTION AND CORRECTION OF REPAIRS IN 
HUMAN-COMPUTER DIALOG* 
John Bear, John Dowding, Elizabeth Shriberg t 
SRI International 
Menlo Park, California 94025 
ABSTRACT 
We have analyzed 607 sentences of sponta- 
neous human-computer speech data containing re- 
pairs, drawn from a total corpus of 10,718 sen- 
tences. We present here criteria and techniques for 
automatically detecting the presence of a repair, 
its location, and making the appropriate correc- 
tion. The criteria involve integration of knowledge 
from several sources: pattern matching, syntactic 
and semantic analysis, and acoustics. 
INTRODUCTION 
Spontaneous spoken language often includes 
speech that is not intended by the speaker to be 
part of the content of the utterance. This speech 
must be detected and deleted in order to correctly 
identify the intended meaning. The broad class 
of disfluencies encompasses a number of phenom- 
ena, including word fragments, interjections, filled 
pauses, restarts, and repairs. We are analyzing 
the repairs in a large subset (over ten thousand 
sentences) of spontaneous speech data collected 
for the DARPA Spoken Language Program3 We 
have categorized these disfluencies as to type and 
frequency, and are investigating methods for their 
automatic detection and correction. Here we re- 
port promising results on detection and correction 
of repairs by combining pattern matching, syn- 
tactic and semantic analysis, and acoustics. This 
paper extends work reported in an earlier paper 
*This research was supported by the Defense Advanced 
Research Projects Agency under Contract ONR N00014- 
90-C-0085 with the Office of Naval Research. It was also 
supported by a Grant, NSF IRI-8905249, from the National 
Science Foundation. The views and conclusions contained 
in this document are those of the authors and should not 
be interpreted as necessarily representing the official poll- 
cies, either expressed or implied, of the Defense Advanced 
Research Projects Agency of the U.S. Government, or of 
the National Science Foundation. 
tElizabeth Shriberg is also affiliated with the Depart- 
ment of Psychology at the University of California at 
Berkeley. 
1DARPA is the Defense Advanced Research Projects 
Agency of the United States Government 
56 
(Shriberg et al., 1992a). 
The problem of disfluent speech for language 
understanding systems has been noted but has 
received limited attention. Hindle (1983) at- 
tempts to delimit and correct repairs in sponta- 
neous human-human dialog, based on transcripts 
containing an "edit signal," or external and reli- 
able marker at the "expunction point," or point of 
interruption. Carbonell and Hayes (1983) briefly 
describe recovery strategies for broken-off and 
restarted utterances in textual input. Ward (1991) 
addresses repairs in spontaneous speech, but does 
not attempt to identify or correct them. Our ap- 
proach is most similar to that of Hindle. It differs, 
however, in that we make no assumption about 
the existence of an explicit edit signal. As a reli- 
able edit signal has yet to be found, we take it as 
our problem to find the site of the repair automat- 
ically. 
It is the case, however, that cues to repair exist 
over a range of syllables. Research in speech pro- 
duction has shown that repairs tend to be marked 
prosodically (Levelt and Cutler, 1983) and there 
is perceptual evidence from work using lowpass- 
filtered speech that human listeners can detect the 
occurrence of a repair in the absence of segmental 
information (Lickley, 1991). 
In the sections that follow, we describe in de- 
tail our corpus of spontaneous speech data and 
present an analysis of the repair phenomena ob- 
served. In addition, we describe ways in which 
pattern matching, syntactic and semantic analy- 
sis, and acoustic analysis can be helpful in detect- 
ing and correcting these repairs. We use pattern 
matching to determine an initial set of possible 
repairs; we then apply information from syntac- 
tic, semantic, and acoustic analyses to distinguish 
actual repairs from false positives. 
THE CORPUS 
The data we are analyzing were collected 
as part of DARPA's Spoken Language Systems 
project. The corpus contains digitized waveforms 
and transcriptions of a large number of sessions in 
which subjects made air travel plans using a com- 
puter. In the majority of sessions, data were col- 
lected in a Wizard of Oz setting, in which subjects 
were led to believe they were talking to a com- 
puter, but in which a human actually interpreted 
and responded to queries. In a small portion of 
the sessions, data were collected using SRI's Spo- 
ken Language System (Shriberg et al., 1992b), in 
which no human intervention was involved. Rel- 
evant to the current paper is the fact that al- 
though the speech was spontaneous, it was some- 
what planned (subjects pressed a button to begin 
speaking to the system) and the transcribers who 
produced lexical transcriptions of the sessions were 
instructed to mark words they inferred were ver- 
bally deleted by the speaker with special symbols. 
For further description of the corpus, see MAD- 
COW (1992). 
NOTATION 
In order to classify these repairs, and to facil- 
itate communication among the authors, it was 
necessary to develop a notational system that 
would: (1) be relatively simple, (2) capture suf- 
ficient detail, and (3) describe the vast majority 
of repairs observed. Table 1 shows examples of 
the notation used, which is described fully in Bear 
et al. (1992). 
The basic aspects of the notation include 
marking the interruption point, the extent of 
the repair, and relevant correspondences between 
words in the region. To mark the site of a re- 
pair, corresponding to Hindle's "edit signal" (Hin- 
die, 1983), we use a vertical bar (I)- To express 
the notion that words on one side of the repair 
correspond to words on the other, we use a com- 
bination of a letter plus a numerical index. The 
letter M indicates that two words match exactly. 
R indicates that the second of the two words 
was intended by the speaker to replace the first. 
The two words must be similar-either of the same 
lexical category, or morphological variants of the 
same base form (including contraction pairs like 
"I/I'd"). Any other word within a repair is no- 
tated with X. A hyphen affixed to a symbol in- 
dicates a word fragment. In addition, certain cue 
words, such as "sorry" or "oops" (marked with 
CR) as well as filled pauses (CF) are also labeled 
57 
I want fl- flights to boston. 
M1 - I M1 
what what are the fares 
M1 I M, 
show me flights daily flights 
M, \[ X M1 
I want a flight one way flight 
M1 \] X X M1 
I want to leave depart before ... 
R1 \] R1 
what are what are the fares 
M, M2 \[ M1 M2 
... fly to boston from boston 
R, M1 \[ R1 M1 
... fly from boston from denver 
M1 R1 \[ M1 R1 
what are are there any flights 
x x \[ 
Table 1: Examples of Notation 
if they occur immediately before the site of a re- 
pair. 
DISTRIBUTION 
Of the 10,000 sentences in our corpus, 607 con- 
tained repairs. We found that 10% of sentences 
longer than nine words contained repairs. In con- 
trast, Levelt (1983) reports a repair rate of 34% for 
human-human dialog. While the rates in this cor- 
pus are lower, they are still high enough to be sig- 
nificant. And, as system developers move toward 
more closely modeling human-human interaction, 
the percentage is likely to rise. 
Although only 607 sentences contained dele- 
tions, some sentences contained more than one, 
for a total of 646 deletions. Table 2 gives the 
breakdown of deletions by length, where length 
is defined as the number of consecutive deleted 
words or word fragments. Most of the deletions 
Deletion Length Occurrences Percentage 
1 376 59% 
2 154 24% 
3 52 8% 
4 25 4% 
5 23 4% 
6+ 16 3% 
Table 2: Distribution of Repairs by Length 
Type Pattern Freq. 
Length 1 Repairs 
Fragments MI-, RI-, X- 61% 
Repeats M1 \[M1 16% 
Insertions M1 \[ X1 ... XiM1 7% 
Replacement R1 \[ R1 9% 
Other X\[X 5% 
Length 2 Repairs 
Repeats M1 M2 \[ M1 M2 28% 
Replace 2nd M1 R1 \[ M1 R1 27% 
Insertions M1 M2 \[MIX1 ... Xi M2 19% 
Replace 1st R1 M1 \[ R1 M1 10% 
Other ...\[... 17% 
Table 3: Distribution of Repairs by Type 
Match 
Length 
2 
3 
4 
Fill Length 
0 1 2 3 
.82 .74 .69 .28° 
(39) (65) (43) (39) 
1.0 .83 .73 .00 
(10) (6) (11) (1) 
1.0 .80 1.0 
(4) (5) (2) 
1.0 1.0 
(2) (1) 
-- indicates no observations 
Table 4: Fill Length vs. Match Length 
were fairly short; deletions of one or two words ac- 
counted for 82% of the data. We categorized the 
length 1 and length 2 repairs according to their 
transcriptions. The results are summarized in Ta- 
ble 3. For simplicity, in this table we have counted 
fragments (which always occurred as the second 
deleted word) as whole words. The overall rate of 
fragments for the length 2 repairs was 34%. 
A major repair type involved matching strings 
of identical words. More than half (339 out of 436) 
of the nontrivial repairs (more editing necessary 
than deleting fragments and filled pauses) in the 
corpus were of this type. Table 4 shows the distri- 
butions of these repairs with respect to two param- 
eters: the length in words of the matched string, 
and the number of words between the two matched 
strings. Numbers in parentheses indicate the num- 
ber of occurrences, and probabilities represent the 
likelihood that the phrase was actually a repair 
and not a false positive. Two trends emerge from 
these data. First, the longer the matched string, 
the more likely the phrase was a repair. Second, 
the more words there were intervening between the 
matched strings, the less likely the phrase was a 
repair. 
SIMPLE PATTERN MATCHING 
We analyzed a subset of 607 sentences con- 
taining repairs and concluded that certain sim- 
ple pattern-matching techniques could successfully 
detect a number of them. The pattern-matching 
58 
component reported on here looks for identical se- 
quences of words, and simple syntactic anomalies, 
such as "a the" or "to from." 
Of the 406 sentences containing nontrivial re- 
pairs, the program successfully found 309. Of 
these it successfully corrected 177. There were 97 
sentences that contained repairs which it did not 
find. In addition, out of the 10,517 sentence corpus 
(10,718 - 201 trivial), it incorrectly hypothesized 
that an additional 191 contained repairs. Thus of 
10,517 sentences of varying lengths, it pulled out 
500 as possibly containing a repair and missed 97 
sentences actually containing a repair. Of the 500 
that it proposed as containing a repair, 62% actu- 
ally did and 38% did not. Of the 62% that had re- 
pairs, it made the appropriate correction for 57%. 
These numbers show that although pattern 
matching is useful in identifying possible repairs, 
it is less successful at making appropriate correc- 
tions. This problem stems largely from the over- 
lap of related patterns. Many sentences contain a 
subsequence of words that match not one but sev- 
eral patterns. For example the phrase "FLIGHT 
<word> FLIGHT" matches three different pat- 
terns: 
show the flight time flight date 
M1 R1 \[ M1 R1 
show the flight earliest flight 
M1 \[ X M1 
show the delta flight united flight 
R1 M1 \[ ~I~l M1 
Each of these sentences is a false positive for 
the other two patterns. Despite these problems 
of overlap, pattern matching is useful in reducing 
the set of candidate sentences to be processed for 
repairs. Rather than applying detailed and pos- 
sibly time-intensive analysis techniques to 10,000 
sentences, we can increase efficiency by limiting 
ourselves to the 500 sentences selected by the pat- 
tern matcher, which has (at least on one measure) 
a 75% recall rate. The repair sites hypothesized 
by the pattern matcher constitute useful input for 
further processing based on other sources of infor- 
mation. 
NATURAL LANGUAGE 
CONSTRAINTS 
Here we describe two sets of experiments to 
measure the effectiveness of a natural language 
processing system in distinguishing repairs from 
false positives. One approach is based on parsing 
of whole sentences; the other is based on parsing 
localized word sequences identified as potential re- 
pairs. Both of these experiments rely on the pat- 
tern matcher to suggest potential repairs. 
The syntactic and semantic components of the 
Gemini natural language processing system are 
used for both of these experiments. Gemini is 
an extensive reimplementation of the Core Lan- 
guage Engine (Alshawi et al., 1988). It includes 
modular syntactic and semantic components, inte- 
grated into an efficient all-paths bottom-up parser 
(Moore and Dowding, 1991). Gemini was trained 
on a 2,200-sentence subset of the full 10,718- 
sentence corpus. Since this subset excluded the 
unanswerable sentences, Gemini's coverage on the 
full corpus is only an estimated 70% for syntax, 
and 50% for semantics. 2 
Global Syntax and Semantics 
In the first experiment, based on parsing com- 
plete sentences, Gemini was tested on a subset 
of the data that the pattern matcher returned as 
likely to contain a repair. We excluded all sen- 
tences that contained fragments, resulting in a 
2Gemlni's syntactic coverage of the 2,200-sentence 
dataset it was trained on (the set of annotated and an- 
swerable MADCOW queries) is approximately 91~, while 
its semantic coverage is approximately 77%. On a recent 
fair test, Gemini's syntactic coverage was 87~0 and seman- 
tic coverage was 71%. 
Syntax Only 
Marked Marked 
as as 
Repair False Positive 
Repairs 68 (96%) 56 (30%) 
False Positives 3 (4%) 131 (70%) 
Syntax and Semantics 
Marked Marked 
as as 
Repair False Positive 
Repairs 64 (85%) 23 (20%) 
False Positives 11 (15%) 90 (80%) 
Table 5: Syntax and Semantics Results 
59 
dataset of 335 sentences, of which 179 contained 
repairs and 176 contained false positives. The ap- 
proach was as follows: for each sentence, parsing 
was attempted. If parsing succeeded, the sentence 
was marked as a false positive. If parsing did not 
succeed, then pattern matching was used to detect 
possible repairs, and the edits associated with the 
repairs were made. Parsing was then reattempted. 
If parsing succeeded at this point, the sentence was 
marked as a repair. Otherwise, it was marked as 
no opinion. 
Table 5 shows the results of these experiments. 
We ran them two ways: once using syntactic con- 
straints alone and again using both syntactic and 
semantic constraints. As can be seen, Gemini 
is quite accurate at detecting a repair, although 
somewhat less accurate at detecting a false posi- 
tive. Furthermore, in cases where Gemini detected 
a repair, it produced the intended correction in 62 
out of 68 cases for syntax alone, and in 60 out of 
64 cases using combined syntax and semantics. In 
both cases, a large number of sentences (29% for 
syntax, 50% for semantics) received a no opinion 
evaluation. The no opinion cases were evenly 
split between repairs and false positives in both 
tests. 
The main points to be noted from Table 5 are 
that with syntax alone, the system is quite ac- 
curate in detecting repairs, and with syntax and 
semantics working together, it is accurate at de- 
tecting false positives. However, since the coverage 
of syntax and semantics will always be lower than 
the coverage of syntax alone, we cannot compare 
these rates directly. 
Since multiple repairs and false positives can 
occur in the same sentence, the pattern matching 
process is constrained to prefer fewer repairs to 
more repairs, and shorter repairs to longer repairs. 
This is done to favor an analysis that deletes the 
fewest words from a sentence. It is often the case 
that more drastic repairs would result in a syntac- 
tically and semantically well-formed sentence, but 
not the sentence that the speaker intended. For 
instance, the sentence "show me <flights> daily 
flights to boston" could be repaired by deleting 
the words "flights daily," and would then yield a 
grammatical sentence, but in this case the speaker 
intended to delete only "flights." 
Local Syntax and Semantics 
In the second experiment we attempted to im- 
prove robustness by applying the parser to small 
substrings of the sentence. When analyzing long 
word strings, the parser is more likely to fail due 
to factors unrelated to the repair. For this ex- 
periment, the parser was using both syntax and 
semantics. 
The phrases used for this experiment were the 
phrases found by the pattern matcher to contain 
matching strings of length one, with up to three 
intervening words. This set was selected because, 
as can be seen from Table 4, it constitutes a large 
subset of the data (186 such phrases). Further- 
more, pattern matching alone contains insufficient 
information for reliably correcting these sentences. 
The relevant substring is taken to be the 
phrase constituting the matched string plus in- 
tervening material plus the immediately preceding 
word. So far we have used only phrases where the 
grammatical category of the matched word was ei- 
ther noun or name (proper noun). For this test we 
specified a list of possible phrase types (NP, VP, 
PP, N, Name) that count as a successful parse. We 
intend to run other tests with other grammatical 
categories, but expect that these other categories 
could need a different heuristic for deciding which 
substring to parse, as well as a different set of ac- 
ceptable phrase types. 
Four candidate strings were derived from the 
original by making the three different possible 
edits, and also including the original string un- 
changed. Each of these strings was analyzed by 
the parser. When the original sequence did not 
60 
parse, but one of edits resulted in a sequence that 
parsed, the original sequence was very unlikely to 
be a false positive (right for 34 of 35 cases). Fur- 
thermore, the edit that parsed was chosen to be 
the repaired string. When more than one of the 
edited strings parsed, the edit was chosen by pre- 
ferring them in the following order: (1) M1\]XM1, 
(2) R1MIIR1M1, (3) M1RI\[M1R1. Of the 37 cases 
of repairs, the correct edit was found in 27 cases, 
while in 7 more an incorrect edit was found; in 
3 cases no opinion was registered. While these 
numbers are quite promising, they may improve 
even more when information from syntax and se- 
mantics is combined with that from acoustics. 
ACOUSTICS 
A third source of information that can be help- 
ful in detecting repairs is acoustics. In this sec- 
tion we describe first how prosodic information can 
help in distinguishing repairs from false positives 
for patterns involving matched words. Second, we 
report promising results from a preliminary study 
of cue words such as "no" and "well." And third, 
we discuss how acoustic information can aid in 
the detection of word fragments, which occur fre- 
quently and which pose difficulty for automatic 
speech recognition systems. 
Acoustic features reported in the following 
analyses were obtained by listening to the sound 
files associated with each transcription, and by 
inspecting waveforms, pitch tracks, and spectro- 
grams produced by the Entropic Waves software 
package. 
Simple Patterns 
While acoustics alone cannot tackle the prob- 
lem of locating repairs, since any prosodic patterns 
found in repairs are likely to be found in fluent 
speech, acoustic information can be quite effective 
when combined with other sources of information, 
in particular with pattern matching. 
In studying the ways in which acoustics might 
help distinguish repairs from false positives, we 
began by examining two patterns conducive to 
acoustic measurement and comparison. First, we 
focused on patterns in which there was only one 
matched word, and in which the two occurrences 
of that word were either adjacent or separated by 
only one word. Matched words allow for compar- 
ison of word duration; proximity helps avoid vari- 
ability due to global intonation contours not asso- 
ciated with the patterns themselves. We present 
here analyses for the MI\[M1 ("flights for <one> 
one person") and M1\]XM1 ("<flight> earliest 
flight") repairs, and their associated false positives 
("u s air five one one," '% flight on flight number 
five one one," respectively). 
In examining the MI\[M1 repair pattern, we 
found that the strongest distinguishing cue be- 
tween the repairs (N = 20) and the false positives 
(N = 20) was the interval between the offset of 
the first word and the onset of the second. False 
positives had a mean gap of 42 msec (s.d. = 55.8) 
as opposed to 380 msec (s.d. = 200.4) for repairs. 
A second difference found between the two groups 
was that, in the case of repairs, there was a statis- 
tically reliable reduction in duration for the sec- 
ond occurrence of M1, with a mean difference of 
53.4 msec. However because false positives showed 
no reliable difference for word duration, this was 
a much less useful predictor than gap duration. 
F0 of the matched words was not helpful in sep- 
arating repairs from false positives; both groups 
showed a highly significant correlation for, and no 
significant difference between, the mean F0 of the 
matched words. 
A different set of features was found to be use- 
ful in distinguishing repairs from false positives 
for the MI\[XM1 pattern. A set of 12 repairs 
and 24 false positives was examined; the set of 
false positives for this analysis included only flu- 
ent cases (i.e., it did not include other types of 
repairs matching the pattern). Despite the small 
data set, some suggestive trends emerge. For ex- 
ample, for cases in which there was a pause (200 
msec or greater) on only one side of the inserted 
word, the pause was never after the insertion (X) 
for the repairs, and rarely before the X in the 
false positives. A second distinguishing character- 
istic was the peak F0 value of X. For repairs, the 
inserted word was nearly always higher in F0 than 
the preceding M1; for false positives, this increase 
in F0 was rarely observed. Table 6 shows the re- 
sults of combining the acoustic constraints just de- 
scribed. As can be seen, such features in combina- 
tion can be quite helpful in distinguishing repairs 
from false positives of this pattern. Future work 
will investigate the use of prosody in distinguish- 
ing the M1 \[XM1 repair not only from false posi- 
tives, but also from other possible repairs having 
this pattern, i.e., M1RI\[M1R1 and R1MI\[R1M1. 
Repairs 
False 
Positives 
Pauses after 
X (only) 
and 
FO of X less 
than FO of 1st M1 
.00 
.58 
Pauses before 
X (only) 
and 
F0 of X greater 
than F0 of 1st M1 
.92 
.00 
Table 6: Combining Acoustic Characteristics of 
M1 IX M1 Repairs 
Cue Words 
A second way in which acoustics can be helpful 
given the output of a pattern matcher is in deter- 
mining whether or not potential cue words such 
as "no" are used as an editing expression (Hock- 
ett, 1967) as in "...flights <between> <boston> 
<and> <dallas> <no> between oakland and 
boston." False positives for these cases are in- 
stances in which the cue word functions in some 
other sense ("I want to leave boston no later than 
one p m."). Hirshberg and Litman (1987) have 
shown that cue words that function differently can 
be distinguished perceptually by listeners on the 
basis of prosody. Thus, we sought to determine 
whether acoustic analysis could help in deciding, 
when such words were present, whether or not 
they marked the interruption point of a repair. 
In a preliminary study of the cue words "no" 
and "well," we compared 9 examples of these 
words at the site of a repair to 15 examples of 
the same words occurring in fluent speech. We 
found that these groups were quite distinguishable 
on the basis of simple prosodic features. Table 7 
shows the percentage of repairs versus false pos- 
itives characterized by a clear rise or fall in F0 
F0 F0 Lexical Cont. 
rise fall stress speech 
Repairs .00 1.00 .00 .00 
False Positives .87 .00 .87 .73 
6\] 
Table 7: Acoustic Characteristics of Cue Words 
8000 
!6000 
4000 
!2000 
.2 
-:i?.'!.~. • \]'~ • :'~'.:'*~.:." '- 
"!"!~:': '..~::!' ~i~ ~ ,~..:~? , 
1.4 1.6 1.8 
~.k~:~;i:~r • :~:~ ~i 
fit 
2.2 2.4 2.6 
:..::.~'~.~:: • i.'.......'.:~i:~.:.:-; ;.~. ,.. ;., -~ '-.~-' 
.. ~:.:. :~..' . ,:.': ,~ ~,~:..'~.. '.;.-.~. 
• ::"'~ '. " :i':i 
2.8 3 3.2 
I would 1 i k e to <fra-> f 1 y 
Figure 1: A glottalized fragment 
(greater than 15 Hz), lexical stress (determined 
perceptually), and continuity of the speech im- 
mediately preceding and following the editing ex- 
pression ("continuous" means there was no silent 
pause on either side of the cue word). As can be 
seen, at least for this limited data set, cue words 
marking repairs were quite distinguishable from 
those same words found in fluent strings on the 
basis of simple prosodic features. 
Fragments 
A third way in which acoustic knowledge can 
assist in detecting and correcting repairs is in the 
recognition of word fragments. As shown earlier, 
fragments are exceedingly common; they occurred 
in 366 of our 607 repairs. Fragments pose diffi- 
culty for state-of-the-art recognition systems be- 
cause most recognizers are constrained to produce 
strings of actual words, rather than allowing par- 
tial words as output. Because so many repairs in- 
volve fragments, if fragments are not represented 
in the recognizer output, then information relevant 
to the processing of repairs is lost. 
We found that often when a fragment had suf- 
ficient acoustic energy, one of two recognition er- 
rors occurred. Either the fragment was misrecog- 
nized as a complete word, or it caused a recog- 
nition error on a neighboring word. Therefore if 
recognizers were able to flag potential word frag- 
ments, this information could aid subsequent pro- 
cessing by indicating the higher likelihood that 
words in the region might require deletion. Frag- 
ments can also be useful in the detection of repairs 
requiring deletion of more than just the fragment. 
In approximately 40% of the sentences containing 
fragments in our data, the fragment occurred at 
the right edge of a longer repair. In a portion of 
62 
these cases, for example, 
"leaving at <seven> <fif-> eight thirty," 
the presence of the fragment is an especially im- 
portant cue because there is nothing (e.g., no 
matched words) to cause the pattern matcher to 
hypothesize the presence of a repair. 
We studied 50 fragments drawn at random 
from our total corpus of 366. The most reliable 
acoustic cue over the set was the presence of a 
silence following the fragment. In 49 out of 50 
cases, there was a silence of greater than 60 msec; 
the average silence was 282 msec. Of the 50 frag- 
ments, 25 ended in a vowel, 13 contained a vowel 
and ended in a consonant, and 12 contained no 
vocalic portion. 
It is likely that recognition of fragments of the 
first type, in which there is abrupt cessation of 
speech during a vowel, can be aided by looking for 
heavy glottalization at the end of the fragment. 
We coded fragments as glottalized if they showed 
irregular pitch pulses in their associated waveform, 
spectrogram, and pitch tracks. We found glottal- 
ization in 24 of the 25 vowel-final fragments in 
our data. An example of a glottalized fragment, is 
shown in Figure 1. 
Although it is true that glottalization occurs 
in fluent speech as well, it normally appears on 
unstressed, low F0 portions of a signal. The 24 
glottalized fragments we examined however, were 
not at the bottom of the speaker's range, and 
most had considerable energy. Thus when com- 
bined with the feature of a following silence of at 
least 60 msec, glottalization on syllables with sulfi- 
cient energy and not at tile bottom of tile speaker's 
range, may prove a useful feature in recognizing 
fragments. 
CONCLUSION 
In summary, disfluencies occur at high enough 
rates in human-computer dialog to merit consid- 
eration. In contrast to earlier approaches, we have 
made it our goal to detect and correct repairs au- 
tomatically, without assuming an explicit edit sig- 
nal. Without such an edit signal, however, re- 
pairs are easily confused both with false positives 
and with other repairs. Preliminary results show 
that pattern matching is effective at detecting re- 
pairs without excessive overgeneration. Our syn- 
tactic/semantic approaches are quite accurate at 
detecting repairs and correcting them. Acoustics 
is a third source of information that can be tapped 
to provide evidence about the existence of a repair. 
While none of these knowledge sources by it- 
self is sufficient, we propose that by combining 
them, and possibly others, we can greatly enhance 
our ability to detect and correct repairs. As a next 
step, we intend to explore additional aspects of the 
syntax and semantics of repairs, analyze further 
acoustic patterns, and pursue the question of how 
best to integrate information from these multiple 
knowledge sources. 
ACKNOWLEDGMENTS 
We would like to thank Patti Price for her 
helpful comments on earlier drafts, as well as for 
her participation in the development of the nota- 
tional system used. We would also like to thank 
Robin Lickley for his feedback on the acoustics 
section, Elizabeth Wade for assistance with the 
statistics, and Mark Gawron for work on the Gem- 
ini grammar. 

REFERENCES 
1. Alshawi, H, Carter, D., van Eijck, J., Moore, R. 
C., Moran, D. B., Pereira, F., Pulman, S., and 
A. Smith (1988) Research Programme In Natural 
Language Processing: July 1988 Annual Report, 
SRI International Tech Note, Cambridge, Eng- 
land. 
2. Bear, J., Dowding, J., Price, P., and E. E. 
Shriberg (1992) "Labeling Conventions for No- 
tating Grammatical Repairs in Speech," unpub- 
lished manuscript, to appear as an SRI Tech Note. 
3. Hirschberg, g. and D. Litman (1987) "Now Let's 
Talk About Now: Identifying Cue Phrases Into- 
nationally," Proceedings o.f the A CL, pp. 163-171. 
4. Carbonell, J. and P. Hayes, P., (1983) "Recov- 
ery Strategies for Parsing Extragrammatical Lan- 
guage," American Journal of Computational Lin- 
guistics, Vol. 9, Numbers 3-4, pp. 123-146. 
5. Hindle, D. (1983) "Deterministic Parsing of Syn- 
tactic Non-fluencies," Proceedings of the A CL, pp. 
123-128. 
6. Hockett, C. (1967) "Where the Tongue Slips, 
There Slip I," in To Honor Roman Jakobson: Vol. 
~, The Hague: Mouton. 
7. Levelt, W. (1983) "Monitoring and self-repair in 
speech," Cognition, Vol. 14, pp. 41-104. 
8. Levelt, W., and A. Cutler (1983) "Prosodic Mark- 
ing in Speech Repair," Journal of Semantics, Vol. 
2, pp. 205-217. 
9. Lickley, R., R. ShiUcock, and E. Bard (1991) 
"Processing Disfluent Speech: How and when are 
disfluencies found?" Proceedings of the Second 
European Conference on Speech Communication 
and Technology, Vol. 3, pp. 1499-1502. 
10. MADCOW (1992) "Multi-site Data Collection for 
a Spoken Language Corpus," Proceedings of the 
DARPA Speech and Natural Language Workshop, 
February 23-26, 1992. 
11. Moore, R. and J. Dowding (1991) "Efficient 
Bottom-up Parsing," Proceedings ol the DARPA 
Speech and Natural Language Workshop, Febru- 
ary 19-22, 1991, pp. 200-203. 
12. Shriberg, E., Bear, 3., and Dowding, J. (1992 a) 
"Automatic Detection and Correction of Repairs 
in Human-Computer Dialog" Proceedings of the 
DARPA Speech and Natural Language Workshop, 
February 23-26, 1992. 
13. Shriberg, E., Wade, E., and P. Price (1992 b) 
"Human-Machine Problem Solving Using Spoken 
Language Systems (SLS): Factors Affecting Per- 
formance and User Satisfaction," Proceedings of 
the DARPA Speech and Natural Language Work- 
shop, February 23-26, 1992. 
14. Ward, W. (1991) "Evaluation of the CMU ATIS 
System," Proceedings of the DARPA Speech and 
Natural Language Workshop, February 19-22, 
1991, pp. 101-105. 
