PAUSE AS A PHRASE DEMARCATOR FOR SPEECH AND LANGUAGE PROCESSING 
JUNKO HOSAKA MARK SELIGMAN HARALD SINGER 
ATR Interpreting Telephony Research Laboratories 
Hika,ridai 2-2, Seika-cho, Sor~ku-gun, Kyoto 619-02, Jap;m 
Abstract 
In spontaneous speech understanding a sophisticated in- 
tegration of speech recognition and language processing 
is espceially crucial. However, the two modnles are tra- 
ditionally designed independently, with independent lin- 
guistie rules. In Japanese spc.ech recognition the bun- 
sctsu phrase is the basic processing unit and in language 
processing the sentence is the basic unit. This difference 
has made it impracticM to use a unique set of linguistic 
rules for both types of processing. Further, spontaneous 
speech contains unexpected utterances other than well- 
formed sentences, while lingnistic rules for both speech 
and language processing expect well-formed sentences. 
They therefore fail to process everyday spoken language. 
To bridge the gap between speech and language process- 
ing, we propose that pauses be treated as phrase demar- 
cators and that the interpausal phrase be the basic com- 
mon processing unit. And to treat the linguistic l)henoI~l- 
ena of spoken language properly, we survey relevant fea- 
tures in spontaneous speech data. We then examine the 
effect of integrating pausal and spontaneous speech phe- 
nomena into synt~tctic rules for speech recognition, using 
118 sentences. Our experiments show that incorporat- 
ing pansal phenomena as purely syntactic constraints de- 
grades recognition accuracy considerably, while the addi- 
tional degradation is minor if some filrther spontaneous 
speech features are also incorporated. 
1 INTRODUCTION 
A spontaneous speech understanding system accepts 
naturally spoken input and understands its meaning. 
hi such a system, speechprocessing and language pro- 
cessiug must be integrated in a sophisticated manner. 
Itowew:r, the integration is not straightforward, as 
the two are stndied independently art(/ have differ- 
ent processing units. Moreover, spontaneous speech 
contains unexpected phenomena, such as hesitations, 
corrections and fragmentary expressions, which thus 
far have not been treated in linguistic rules. 
The most significant concern in speech processing 
is raising the recognition accuracy. For that purpose, 
applying linguistic information, e.g. using stochastic 
models\[l l, syntactic rules\[2\], sen,antic intbrmation\[3\] 
and discourse plan@l\], is most promising. In a recent 
Japanese speech translation system\[5\] b*lnselsu-based 
syntactic constraints are successfully applied in the 
speech processing module\[6\] 1, However, rules repre- 
l A bunsetsu rouglfly corresponds to a phrase and is the next 
largest unit after the word. The nunfl)er of words in a phrase 
ranges from I to 14, art(\] the mean numl)er is al)ont 317\]. 
senting the same constraints cannot be used directly 
in sentence-based language processing, where the pri- 
mary concern is to understand sentence meaning. In 
speech recognition, a sequence of words forms a bun- 
selsu and a set of bunseisus then forms a sentence. 
In language processing, on the other hand, where 
the sentence is the basic processing unit, treating the 
main verh aud its complements is usually the core of 
processing. For the sentence kaigi ni moshikomi tai 
no desu ga, meauing 'I would like to apply for the 
conference,' the processing discrepancy is sketched in 
Figure 1: 
Speech Processing 
kaigi n, ~moshikomi\]~no desu ga 
LT I I 
.. I 
Language Processing 
\]moshikomi~ tai no dosu ga 
I I 
- 7 .~L .. 
Figure 1: Structural Difference 
Although linguistic rules for speech recognition al- 
ways cope with uncertain l)honeme hypotheses, they 
still expect well-fornmd speech input, and this is even 
more true of linguistic rules in language processing. 
In spontaneous speech, however, there are hesita- 
tions, corrections and incomplete utterances which 
are uot treated in the conventional framework. 
In addressing spontaneous speech understanding, 
two main prohlems must be solved: the absence of 
common processing components a~s sketched in Fig- 
ure 1, and our insufficient knowledge of spontaneous 
speech features. In this paper, we propose the pause 
as a phrase demarcator and the interpausal phrase 
as the basic processing unit. A phrase is natu- 
rally demarcated with pauses in spoken language and 
an interpausal phrase often functions as a meaning 
unit\[8\]\[9\], in spontaneous speech understanding we 
must both accept naturally spoken input and under- 
stand its lneaning. Use of the pause as a phrase de- 
marcator is advantageous for both of these purposes. 
Further, we investigate several frequent spontaneous 
987 
speech fleatures using spontaneous speech data\[10\]. 
We then apply tile study to speech recognition. We 
examine the effect of integrating into syntactic rules 
pausal phenomena and certain features of spoken lan- 
guage, using 118 test sentences. 
2 ANALYSIS OF SPONTA- 
NEOUS DIALOGUES 
2.1 Spontaneous Dialogue Data 
As sources of spontaneous data, we nse four Japanese 
dialogues concerning directions from Kyoto station 
to either a conference center or a hotel, collected 
in the Environment for Multi-Modal lnteraction\[10\]. 
Speaker A is pre-trained to give the directions, men- 
tioning possible transportation, location and so forth. 
Two subjects seeking directions, Speaker B and 
Speaker C, are given some keywords, such as the 
name and tim date of the conference. They may use 
telephone connections only, or may use a multimodal 
setnp with onscreen graphics and video as well. Ta- 
ble 1 shows how many words are used in tile dialogues 
studied: 
Table 1: Words in the Corpora 
Speakers A ,B 
Speakers A,C 
Subtotal 
Telephone Multimedia 
536 714 
1167 1124 
~7o3 1838 
Total 3541 
The corpora consists of 3541 words in total, and 
contains 440 different words, it has 403 turn-takings, 
and thus roughly 403 sentences. 
In the multimedia setup, speakers use deictic ex- 
pressions such as koko and kore meaning "here" and 
"this," respectively. The dialogues also la~sted longer 
than those in the telephone-only setup. Itowever, we 
did not find any further distinct differences between 
the two setups. We therefore analyse all of the dia- 
logues in tile same way. 
For our stndy, transcripts of the spontaneous di- 
alogues have been prepared, and these contain too> 
photogical tags and turn-taking information. Pause 
information within turns, i.e., breaths or silences 
longer than 400 miliseconds, is provided a~s well. 
2.2 Pause as a Phrase Demarcator 
In Table 2 we illustrate the adequacy of the inter- 
pausal phrase as a processing unit with a series of di- 
rections to Kyoto station's Karasumachou exit. 3'he 
entire explanation consists of three turns separated by 
short response syllables, snch as hat, that do not over- 
lap I,l~e explanation. That is, the speaker paused dur- 
ing these responses. We marked each turn with '/'URN 
at the end. As a primary demarcator we used pauses 
and turns. Thus either PAUSE or TURN appears in the 
second colunm. Further demarcator candidates such 
as the filled pauses anoo or Pete, the emphasis marker 
desune and the response syllable hat when overlap- 
ping the explanation appear in the third eohmm as 
FILLED PAUSE, DESUNE and RESPONSE, respectively. 
A rough translation follows each interpausal phrase: 
Table 2: Phrase Demarcator 
~2 K ~@"QL2~: 6 PAUSE FILLED PAUSE 
if it is from here 
~ 6 PAUSE 
this side 
¢-)~t~>*&'-\[:2Z)~ O "C'N ~ ~ b~ PAUSE R, ESPONSE 
you go up the stairs 
c cfo a/~o-cN~- TUaN 
you cross here all the way 
~* PAUSE 
and 
~ ,~,-¢' I~ESPONSE 
-- ~: J~ Y~JJ m PAUSE 
when you see the nezt stairs, this one, turn left, first 
~_ ~ 7-~" PAUSE DESUNE 
at this place like a crossroad which appears 
~'~cEf o~CT;~ ~ ~ 5- TURN 
turn rigM 
"(" ,~ff IC '~ "o "% I~ Iz'~ X2 " PAUSE 
and yell t'~lrTz right 
-PC c a) N~-C- 
I~g ~ -C\]*.~ ~ "~ ~- ~ PAUSE t~ESPONSE 
and lhen if you go down the stairs here 
you come out of the karasumachou emil 
The length of the processing unit plays an impel 
rant role in speech recognition. Table 2 shows that 
alternative demarcator candidates such as FILLED 
PAUSE and RESPONSE usually cooccur with pauses. 
In Table 2, for example, we find only one case where 
RESPONSE does not eooecur with a pause. Conse- 
quently, tile segments within turns bounded by these 
alternative markers would not be much different from 
those bounded by pauses; in particular, they would 
not be nan& shorter or longer. Thus, at least where 
length is concerned, the combination of PAUSE and 
TURN seems appropriate and sufficient to mark out 
phrases. With respect to language processing, Table 
2 shows that interpausal phrases are often adequate 
as translation units, which suggests that such phrases 
often function as meaning units. 
Interpausal phrases typically end with a conjunc- 
tive postposition, such ms ya or keredomo; a postpo- 
sitional phrase; an interjection, such as hat or moshi- 
moshi; the genitive postposition no for adnominals; 
988 
all adnominal conjugaL|oil forlll; ;t coor(/itmJ.e cot@l- 
gation form; ~m×iliaries with senl;ence liua\[ conjuga- 
tiol: form; or a seut,enee final l)arl.icle, such as lea or 
"ll £. 
2.3 Features of Spontaneous Dia- 
logues 
We studied t, en features of Sl)Ont~mc.ous dialogues 
which are not, consid(,red iu grammars for weal \['ormed 
senl;ences\[6\]\[I 1\]. Table 3 shows the fi'ah:res and t;hcir 
frequem:ies: 
In Ex. 2 Speaker \]3 did not; finish whag he wm,i, ed 
t.o say, but SpeMcer A m:derstood his iutention and 
inl;err:ll)ted his utterance, which is therefore fragumn- 
tary. Speaker 11 continued but, before he could liaish 
Speaker A finished for him. So Speaker B's l:tge.ra:lce 
is again \]'r:tgn:el:l, al'y. 
Ex. 3 
Speaker A: ful:aeki (le 
ad'ter I, wo stops 
Speaker H: keage 
keage 
5'peaker A: sou de,su 
that's right 
Tabh'. 3: Feature and Occurrence 
Us(: of dc,s~.ze :ff I 
Use of a~:oo 35 I 
Fragmentary ul;term~ce 2:5 \] 
IJse of ec/o 1,5 I 
End o\[" tm'n with a PP 7 : 
POStl)osition drop 7 ', 
Question without ka 5 \] 
I)isfluency: so ude.~'~n~, 51 
Apposition 1 I 
Inversion 31 
We expected a very high frequency of the \[|{led 
pauses a'0oo and celo flmctioaiag as discourse 
managers\[I2\], lloweve.r, Table 3 shows only a rood 
est frequency. Iq~ol:ological varim, ions such as utb*oo 
al:d aTio for a11oo ;Hid etlov a:ld cello \['or 0el0 were 
uot coltllted. This may be why the \['requeucy off bed: 
cxpr(..ssions ix unexpectedly low. 
Some flai, ures shown in Table :1 are disc:,ssed in 
the ('.X;-UI/I)Ie sets below. Fe.al, ures it: focus ;~re iu bold 
type: 
F,x. 1 
soch.h'a ~Io (lesmte noviba kava basu ga desune 
dele.masu 
there is a bus fl'om that bus s~,op 
"\]'he person giving dire.cdons off, e:: uses dm expres- 
sion desu~:e. The use o\[" dcsu'ne emphasiz(:s t, he pre- 
ceding utterance., typically the inlmediat.ely preceding 
miMmal phrase. In Ex. I the first use emphasizes 
sochira no and the second sl, resse.s ba.s.u yR. 
We deuol, e t, he person giving the directions as 
Sp(,akcr A aud the person seeking the infornmtion 
as Speal:er B in Examples 2, and 3. 
Ex. 2 
Speaker lk keagc no kita 
norl,h <ff keage 
.5'l;cakcv A: sou des'~l 
that's rig}it 
Speaker I~: (legnehi 
exit 
Speaker A: f~hzdcg'uchi dc,~'a ~tc 
il/s tl~e nord~ exit, okay? 
Speaker A is giving directions but before he has 
completed his ul, terancv Spealce.r B interrupts witl~ 
the station name. SpeM:er A did not continue his 
\[h'sl, utterance and agreed wit\[: Speaker B. St)e.ake.r 
A's first utterance is a non:|hal phrase, which is never 
eomlJe.ted. 
.... -4 1 - " 3 APPI,ICA\]ION OF THE 
ANALYSIS 
To e×amine the l'easibility of integrating h:to syn- 
tactic rules both p:msal phenoutena and the fi;ah:res 
0\[" SI)OIILI/:IOOIlS speech studied in Section 2, we pre- 
pared three, dill'trent sets of rules. In all three s(%s, 
rules have bee.n exl)licitly u:oditied l;o represent lmUSgd 
phel:ot:wp.a. The. first set: Pause; contains only such 
modifications, while I,he other l;wo sets add olle ad 
ditionai spont:meous 5mtut'e each: rule set Emphasis 
l>crmits llse o\[" |,he ell:l)hasis marker deswnc el'Let a 
noun phrase, while rule set Turn allows t)ostposidonal 
u(;i.erauccs at; t:he end o\[' a turn. \a?e conducted pre. 
liminary speech recoguitiou cxperiment, s with a pgLrser 
which uses linguist, ic constraints written ~us a CFC. 
(.~Ollstralrlt, s 3.\] Linguistic ~ " 
To represem; ore' underlying linguistic eonstnfints we 
adapted existiug synt;wt.ie rules developed for sl)eech 
recognition\[6\]. Earlier expcriluents using b'lutselsu- 
based sl)eech input showed 70% sent, ence reeognidon 
accuracy for tl:e top caudidat, e and 8,1% for d:c. top 5 
e:mdidates. 
The format for all of our synt, actic :':alex ix as fob- 
lows; 
(<CATI> <--> (<CAT2> <CAT3>)) 
Nonterminals are surrounded by <>. \]'he above 
rule indicates thal. CATI consists of CAT2 al:d CAT3. 
We denote the categories in interpa::sa/ phrase rules 
in lower-cruse and t, he categories in interpausal phrase- 
based se:/gellee rllieS il: upper-case. 
In the rule set Pause we prepared about d5 
l>hrases dmt can end will: a pause: postposi- 
tionaI phrases, COllj:lllCt, ive phrases, adnominM ver- 
bal phrases marked with a special conjugation form, 
989 
phrases that end with a conjunctive postposition, ad- 
nominal phrases with the genitive postposition no, 
and coordinate verbal phrases. The first three rules 
are as follows: 
(<pp-pau> <--> (<pp> <pause>)) 
(<conj-pau> <--> (<conj> <pause>)) 
(<vaux-mod-pau> <--> (<vaux-mod> <pause>)) 
In the rule set Emphasis we prepared seven addi- 
tional rules for treating the emphasis marker desune, 
represented as follows: 
(<pp-pau> <--> (<pp> <emphasis> <pause>)) 
(<pp-no-pau> <--> 
(<pp-no> <emphasis> <pause>)) 
Methods for combining interpausal phrases to ob- 
tain an overall utterance meaning require further 
study. At this stage we defined a sentence very 
loosely. It can be an interjection; an interjection 
followed by a combination of interpausal phrases; or 
simply a combination of interpausal phrases. To al- 
low fragmentary ntterances, in the rule set Turn, we 
also introduced a sentence consisting of a nominal 
phrase, which may contain adnominal phrases. Com- 
plete sentences in Turn are defined as follows: 
(<SSS> <--> (<INTERJI>)) 
(<SSS> <--> (<INTERJI> <SS>)) 
(<SSS> <--> (<SS>)) 
(<SSS> <--> (<M-NN>)) 
Table 4 shows the size and phoneme perplexity of 
the three sets of rules: 
Table 4: Size and Perplexity 
Pause Emphasis Turn 
Rules 2326 2333 2327 
Words 751 752 751 
Perplexity 3.96 3.96 3.96 
A given phoneme string can belong to several cat- 
egories. For instance, de can be a postposition or 
a copula conjugation form. The number of different 
phoneme strings is 503 for Pause and Turn, and 504 
for Emphasis. 
3.2 Speech Recognition Experiment 
We conducted a speech recognition experiment with 
118 test sentences concerning secretarial services for 
an international conference. A professional broad- 
caster uttered the sentences without any special con- 
straints such as pause placement. 
For our speech recognition parser, we used tIMM- 
LR\[14\], which is a combination of generalized LR 
parsing and Hidden Markov Models (HMM). The sys- 
tem predicts phonemes by using an LR parsing table 
and drives HMM phoneme verifiers to detect or ver- 
ify them without any intervening structure such as a 
phoneme lattice. Linguistic rules for parsing can be 
written m CFG format. 
As mentioned in section 3.1, we explicitly defined 
rules that can end with pauses in linguistic con- 
straints. According to the pause model, a pause can 
last from 1 to 150 frames, where a frame lasts 9 reset. 
Examples (1) and (2) show the results of ItMM- 
Lit. Japanese speech recognition 2. (1) shows sample 
results of rule set Pause and (2) shows sample results 
of Turn. The phoneme strings which were actually 
pronounced are enclosed in I I: 
(i) I kaiginoaNnaishowaomo chide suka I 
(Do you have a conference invitation?) 
............................................ 
I : kaigi-no-P-aNnaisyo-o-omochi-desu-ka 
2 : kaigi-ni-P-aNnaisyo-o-omochi-desu-ka 
3 : kaigi-ga-P-aNnaisyo-o-omochi-desu-ka 
> 4: kaigi-no-P-aNnaisyo-wa-P-omoehi-desu-ka 
5 : kaigi-ni-P-aNnaisyo-wa-P-omochi-desu-ka 
(2) \[iie\[ (no) 
............................................ 
1 : imi-e 
2: igo-e 
> 3: iie 
4: ima-e 
S: kigeg-e 
In the examples, the symbols >, -, N and P have 
special meaning: A correctly recognized phrase is 
marked with >. A word boundary is marked with -. 
A syllabic nasal is transcribed N. A pause is marked 
with p. 
Example (1) shows typical recognition errors in- 
volving postpositions like no, m, ga, and o, which of- 
ten receive reduced pronunciation in natural speech. 
The surounding context may aggravate the problem. 
IIere, for instance, topic marker wa is erroneously rec- 
ognized as object marker o in the environment; of pre- 
ceding and subsequent phoneme o. The possible in- 
troduction of pauses at such junctures further compli- 
cates the recognition problem. Analysis deeper than 
CFG parsing will often be needed to filter unlikely 
candidates. Example (2) demonstrates the dangers 
of allowing postpositional phrases to end utterances. 
Here, all recognition candidates other than the third 
are inappropriate postpositional phrases. To recog- 
nize the unlikelihood of such candidates, we will need 
further controls, such as discourse management. 
Our resulting sentence speech recognition accura- 
cies are shown in Table 5. For instance, using rule set 
Pause, the correct candidate was the highest rank- 
ing candidate 50.0 percent of the time, Rank 1, while 
the correct candidate was among the top ,5 candidates 
55.9 percent of the time, Rank 5. 
2The maximal amount of the whole beam width, called the 
global beam width, is set at 100, emd the maximM beau width 
of each branch, the local beam width, is 12. 
990 
Table 5: Recognition t{ate (%) 
y-  T.o T 5o /,\[o.H 
I I < 4.2 i i 
I I I i 
With the underlying linguistic rules fl)r the three 
rule sets, earlier experiments had achieved 70% sen- 
I, ence speech l:ecognition accuracy for speech input 
with explicit p~mses at bunsets'u bonndaries. Our best, 
present results tbr spontaneous speech are much more 
modest: 50%. 
'l'~d~le 5 shows that the introduction of the empha- 
sis marker des'uric did not affect processing: as seen in 
Table 4, rule set Emphasis has a slightly higher per- 
plexity than Pause, but we had ex~(:tly the same re- 
sues for the two. On I;he other hand, the perplexities 
of Pause and Turn ~re identical, but the treattnent of 
fragmentary utterances did decrease recognition ac- 
Clll:acy. 
4 CONCLUSION 
2'o treat spontaneous speech understauding we have 
two main problems: the absence of a common pro- 
ceasing unit gJ.lld insuflieieilt knowle.dge of spouta- 
rictus speech fcatarea. 
We have proposed pauses as i)hrase detYlarcatol's 
and interpausM phrases as common processing units 
to allow integration of speech recognition and lan- 
guage processing in the processing of spontaneous 
speech understand\[us. We demonstrated the adwm- 
gages of processing based on iutcrpausaI phrases using 
examples taken from spontameous speech dialogues 
containing 3,541 words. Using the same data, we 
studied certain features of spoken language, such as 
tilled pauses and fragmentary utterances. Based on 
the study, we prepared three difDrent CFG rule se.ts 
for preliminary speech recognition experiments. In 
all three sets, rules have been e×plicitly modified to 
represent pausal phenomena. Tiw. first set eolltaiiis 
only such modifications, while the other two sets acid 
tile addit, ional spontaneous feature each: rise of the 
emphasis marker desune after a noun phrase or post- 
positional utterances at the end of a turn. For 118 
sel/tences, sel/tence reco~llitioll acctlracy \['or pause- 
based rules was considerably less than the accuracy 
obtidned in earlier buTiseisu-based tests using manda- 
tory pauses at b~tn.selslt boundaries; but flirt, her loss 
of accuracy caused by incorporating the spontaneous 
features was minor. 
We believe that the loss of speech recognition ac- 
curacy for sentences seen in our pause-based exper 
iments is largely due to the difficulties of eombin- 
lug interpausaI phrase hypotheses. Our r/lies cur- 
reiltly eombine interpausal phrases in a relatively un- 
constrained lllS.unerl tlsillg only weak syutactic COll- 
straiuts. Based vn filrther study of the structures 
which precede and follow pauses or filled pauses, we 
hope t.o provide stronger syntactic constraints in the 
ftit'dre. 
5 ACKNOWLEDGEMENTS 
Wc wish to thank \])r. Y. Yamazaki, President of 
ATR-ITL, 2'. Morimoto, Ilead of Department 4, and 
many of our \[TL colleagues for their generous support 
slid ellcollragelilell t. 
References 
\[1\] Lee, K.-F. and Iton, \[\[.-W.(1988): "Large- 
VocMmMry Speaker-independent Continuous Speech 
Recognition Using \[\]MM," Prec. of ICASSP-88, 
pp. 123-126. 
\[2\] Ney, II,(\]987): "l)ymmfic t'rogrammlng Speech 
Recognition Using a (\]ontexl.-Free Grammar," Proc. 
of IC, ASSP-87, pp.69-72. 
\[3\] Matsunaga, S., Sagayama, S., Honmia, S. and Furui, 
S.(1990): "A Continuous Speech Recognition System 
Based on a Two-Level Grammm: Approach," Pro<:. of 
ICASSP-90, pp.589-592. 
\[4\] Yamaok~h T. and lida, H.(19.90): "A Method to Pre- 
dict the Next Utterance \[)'sing it Four-layered Plan 
Recognition Model," Prec. e\[ ECAL90, pp.726-731. 
\[5\] Morimoto, T., Takezawa, T., Yato, F., ct M.(1993): 
"AG'IUs Spec'ch G'ransb~tion System: ASUH A," Prec. 
of Eurospcech-93, Vol.2, pp.129\]-t294. 
\[6\] \[\[osaka, J., TMcezawa, T.(1992): "Construction of 
corpus-based syntactic rules for accurate speech 
recognition," Prec. of COtiNG-92, pi,.806-812. 
\[7\] Ehara., '1'., Ogura, IC, Mot\[mote, T. (1990): "ATR 
l)ia.logue \])atahase," Prec. of ICSLI>-90, pp. 1093- 
1096. 
\[8\] Fodor, J., Bever, %(1965): "'Fhe psychological real- 
icy of linguistic segments," Journal of Verbal Learn- 
ing aud Behavior, pp. 4:414-420. 
\[9\] Sugito, M.(t988):"Pause and intonation in dis- 
course," Nihongo to nihongo kyouiku, Vol.2, pp.343-. 
363 (in Japanese). 
I,oken-Kim, K., Yato, F., et a1.(1993): EMMI- 
ATR environment for multi-roods{ inter~Lction, q'T- 
IT-0081, A'\['R. 
llesak~h 3.(1993): A (Iramlmtr for Japanese Genera- 
tion in l, he TUG Fr;tmework, TechnicaJ Report TIL 
1-0346, A'I'IL 
Sadanobu, T., Takubo, Y.(;1993): "The Discourse 
M~nagement Function of Fillers -a ca.se of "eeto" and 
"ant(o)'>-, Prec. of ISSD-93, pp.271-274. 
Hosaka, J., '\['akezawa, 'l'., Uratani, N.(1992): "An- 
alyzing Postposition \[)tops in Spoken Japanese," 
Prec. of l(3SLP-92, Vol.2, pp.1251q254. 
Kita, K., Kawabala, T., Saito, li. (1989): "HMM 
Continuous Speech l{ecogniton UsiiIg Predictive LI{ 
Parsing," Prec. of ICASSP-89, pp.703-7\[)6. 
\[10\] 
\[1~\] 
\[12\] 
\[13\] 
\[14\] 
991 
