A GRAMMAR AND A PARSER FOR 
SPONTANEOUS SPEECH 
Mikio Nakano, Akira Shimazu, and Kiyoshi Kogure 
NTT Basic Research Laboratories 
3-1 Morinosato-Wakamiya, Atsugi-shi, Kanagawa, 243-01 Japan 
{nakano, shiraazu, kogure}~atora.nt'c.jp 
ABSTRACT 
This paper classifies distinctive phenomena occur- 
ring in Japanese spontaneous speech, and proposes 
a grammar and processing techniques for handling 
them. Parsers using a grammar for written sentences 
cannot deal with spontaneous speech because in spon- 
taneous speech there are phenomena that do not occur 
in written sentences. A grammar based on analysis of 
transcripts of dialogues was therefore developed. It 
has two distinctive features: it uses short units as 
input units instead of using sentences in grammars 
for written sentences, and it covers utterances includ- 
ing phrases peculiar to spontaneous speech. Since the 
grammar is an augmentation of a grammar for writ- 
ten sentences, it can also be used to analyze complex 
utterances. Incorporating the grammar into the dis- 
tributed natural language processing model described 
elsewhere enables the handling of utterances includ- 
ing variety of phenomena peculiar to spontaneous 
speech. 
1 INTRODUCTION 
Most dialogue understanding studies have focused on 
the mental states, plans, and intentions of the par- 
ticipants (Cohen et al., 1990). These studies have 
presumed that utterances can be analyzed syntacti- 
cally and semantically and that the representation of 
the speech acts performed by those ntterances can 
be obtained. Spontaneonsly spoken utterances differ 
considerably from written sentences, however, so it is 
not possible to analyze them syntactically and seman- 
tically when using a grammar for written sentences. 
Spontaneous speech, a sequence of spontaneously 
spoken utterances, can be distinguished from well- 
planned utterances like radio news and movie dia- 
logues. Mnch effort has been put into incorporating 
grammatical information into speech mlderstanding 
(e.g., Hayes et el. (1986), Young et al. (1989), Okada 
(1991)), but because this work has focused on well- 
planned utterances, spontaneously spoken utterances 
have received little attention. This has partly been 
due to the lack of a grammar and processing technique 
that can be applied to spontaneous speech. Conse- 
quently, to attain an understanding of dialogues it is 
necessary to develop a way to analyze spontaneous 
speech syntactically and semantically. 
There are two approaches to developing this kind 
of analysis method: one is to develop a grammar 
and analysis method for spontaneous speech that do 
not depend on syntactic constraints as much as the 
conventional methods for written sentences do (Den, 
1993), and the other is to augment the grammar used 
for written sentences and modify the conventional 
analysis method to deal with spontaneous speech. 
The former method would fail, however, when new in- 
formation is conveyed in the utterances; that is, when 
the semantic characteristics of the dialogue topic are 
not known to the hearer. In such cases, even ill a 
dialogue, the syntactic constraints are nsed for un- 
derstanding utterances. Because methods that dis- 
regard syntactic constraints would not work well in 
these kinds of cases, we took the latter approach. 
We analyzed more than a hundred dialogue tran- 
scripts and classified the distinctive phenomena in 
spontaneous Japanese speech. To handle those phe- 
nomena, we develop a computational model called L'n- 
semble Model (Shimazu et al., 1993b), in which syn- 
tactic, semantic, and pragmatic processing modules 
and modules that do combination of some or all of 
those processing analyze the input in i)arallel and in- 
dependently. Even if some of the modules are unable 
to analyze the input, the other modules still output 
their results. This mode\] can handle various kinds of 
irregular expressions, such as case particle omission, 
inversions, and fragmentary expressions. 
We also developed Grass-.\] ( GT"ammar 
for spontaneous speech in Japanese), which enables 
the syntactic and semantic processing modules of t~he 
Ensemble Model to deal with some of the phenomena 
peculiar to spontaneous speech. Since G~'ass-.\] is an 
augmentation of a grammar used to analyze written 
sentences (Grat-J, Gr'ammar for lexts in Japanese), 
Crass-Y-based parsers can be used for syntactically 
complex utterances. 
There are two distinctive features of' G~'ass-J. One 
is that its focus is on the short units in spontaneous 
speech, called utter'auce units. An utterance uniL in- 
stead of a sentence as in Gral-J is used as a gram- 
matical category and is taken as the start symbol. A 
Grass-J-based parser takes an utterance unit as in- 
put and outputs the representation of the speech act 
(illoeutionary act) performed by the unit. The other 
distinctive feature is a focus on expressions peculiar 
to spontaneous speech, and here we explain how to 
augment (h'at-J so that it can handle them. Pre- 
vious studies of spontaneous speech analysis have fo- 
cused mainly on repairs and ellipses (Bear et el., 1992; 
l,anger, 1990; Nakatani & Hirschberg, 1993; Otsuka 
~; Okada, 1992), rather than expressions peculiar to 
spontaneous speech. 
This paper first describes Grat-J, and then classi- 
ties distinctive phenomena in Japanese spontaneous 
speech. It then describes Grass-Y and presents sev- 
eral analysis examples. 
1014 
1. Subcategorization rule 
Rule for NP (with particle) -VP constructions. 
M~CH 
(M head) = (FI head) 
(14 subcat) = (M subcat) U (C) 
(M adjacent) --- nil 
(H adjacent) = nil 
(M adjunct} = (kl adjunct} 
(M lexical) -- 
(M sere index) ~ (H sere index) 
(M sere restric) 
= (C sere restric) u (H sere restric) 
Symbols M, C, and tt are not names of categories but 
variables, or identifiers of root nodes in the graphs rep- 
resenting feature structures. M, C, and H correspond 
to mother, complement daughter, and head daughter. 
The head daughter's subcat feature value is a set of 
feature structures. 
2. Adjacency rule 
Rule for VP-AUXV constructions, Nf x particle construe 
tioIlS, etc, 
M-+AH 
(M head) = {H head) 
(M subeat} -- (I\] subcat} 
(U adjacent) = (A) 
(M adjacent) :- nil 
(M adjunct) :: (H adjunct} 
(M lexlcal} - - 
(M sere index} = (H sere index) 
(M sem restric} 
= (A sem restric) U 04 sere restric) 
M, A, and H correspond to mother, adjacent daughter, 
and head daughter. The head daughter's adjacent fea- 
ture value is unified with the adjacent daughter's feature 
strtlcture. 
3. Adjunction rule 
Rule for modifier modifiee constructions. 
M~AH 
(M Imad) = (H he)el) 
(M subcat) = (H subcat) 
(H adjacent) = nil 
(A adjunct) = {H) 
(M lexical) = -- 
(M sere index) -- {H sere index) 
(M sere restric) 
= (A sem restric) U (H sem restric) 
M, A, and H correspond to motlmr, adjunct daughter 
(modifier), and head daughter (modifiee), Tile adjunct 
daughter's adjunct feature value is the feature structure 
for the head daughter. 
Fig. 1: Phrase structure rules in (;rat-.\]. 
2 A GRAMMAR FOil. WRITTEN 
SENTENCES 
(TrM-3, a grammar for writte.n sentences, iv a uni- 
fication grammar loosely based on Japanese phrase 
structure gr~mlnar (JI'SG) (Gunji, 1986). Of Lhe six 
phrase structure rules used in Gral-J, the three related 
to the discussion in the following sections are shOWll 
in Fig. 1 in a I)A'l'll.d\] like notation (Shieber, 1986)) 
\],exica\] items are. represented by feature structures, 
and example of which is shown in Fig. 2. 
Grat-J-based p~trsers gellerate SOlllalll, iC representa- 
1 lhtles for relative cirCuses ~tl,d for verb-phr~tse coordi- 
mttions are not showll here. 
he~d \[ 
sub<:at { 
~,lj;t(:ent rill 
~djun(:t nil 
lexical yes 
selll \[ 
pus vet b 1 
infl sentence-final } 
hea.d llOlIII 
c~se g& (NGM) 
sere \[index *x \] 
head \[IO1111 
(:~ts(! o (A CC,) 
sere \[ index *y \] 
index *e J~ \] f (k,ve *e) "1 
restric { (~tgent *e *x) 
\[ (p~tient *e *y) 
Fig. 2: Feature strueture for the word 'aisuru' (low.). 
lions in logical ff)rm in l)avidsonian style. The se- 
ina.ntic represealtation ill each lexical item eonsisls of 
a wu'iable ealled ;m inde,: (feature, (sent index}) ;rod 
restrictions i)laced on it, (feature (selll restric)). Every 
time a l)hrase, structure rule is ~q)lflied, l, hese restrie 
tions ~tre aggregated and a logical form is synthesized. 
For exumple, let us ~gain consider 'aisuru' (love). 
If, in the feature structure for the phr;me 'Taro ga' 
(Taro-NOM), the (sen, index) value is *p a.nd gl~,, 
(sere restrie) value is {(taro *p)}, after the subc.at- 
egorization rule is al)plid the {sere restric) v~due ill 
the resulting feature str/lcture for the phrase "\['aro ga 
ais.rlC (%,'o 'oves} i~ {(~ro *x) 0ov,, *e) (ag<~t *e 
*x) (patient *e *y)}. 
(Trat-,! cowers such fundamental Jal)~mese l)henom - 
ena as subcategorizal.ion, passivization, interrogatiou, 
coordination= and negation, and also covers copulas, 
relative clauses, and conjunctions. We developed a 
parser based on (;rat-,l by using botton>u I) eha.rt 
pursing (Kay, 1980). Unification operations are per- 
formed by using constraint projection, ,Ul efficient 
method for unifying disjunctive lhature descriptions 
(Nakano, 1991). The l)arser is inq)lemented in Lucid 
(',ommon Lisp ver. 4.0. 
3 DISTINCTIVE PHENOMENA IN 
,IAPANESE SPONTANEOUS SPEECtI 
3.1 Classification of PhcImmena 
We analyzed 97 telephone dialogues (about 300,000 
bytes) ~d)out using ldli!\]X to pl'epare docunmnts and 
26 dialogues (about i6(),O00 bytes) obtained from 
three radio lisl;ener call-in programs (Shimctzu et al., 
1993a). We found that a.ugmentiltg the gr~:mmlal's aud 
analysis methods requires taking into acconllL &{, least, 
the following six phenomena in Japanese spontaneous 
speech. 
(1)\[) expressions peculiar to Japanese spontaneous 
speech, including fillers (or hesitations). 
(ex.) 'etto aru ndesnkedomo ._ ' 'kono fMru tar _, 
...' (wel\], we haw'~ thenl.., this file is...) 
(i)2) ll~rticlc (ease pnrtiete) omission 
(ex.) 'sore w,u.ashi y'a,'imasu' (I will do it.) 
0)3) matin verb ellipsis, or fragmentary ul, l, erances 
1015 
(ex.) 'aa, shinkansen de Kyoto kara.' (uh, from 
Kyoto by Shinkansen line.) 
(p4) repairing phrases 
(ex.) 'ano chosya be, chosya no arufabetto jun 
ni naran da, indekkusu naai?' (well, are there, 
aren't there indices ordered alphabetically by au- 
thors' names?) 
(p5) inversion 
(ex.) 'kopii shire kudasai, sono ronbun.' (That 
paper, please copy.) 
(p6) semantic mismatch of the theme/subject and the 
main verb 
(ex.) 'rikuesuto no uketsnkejikan wa, 24-jikan 
jonji uketsuke teori masu.' (The hours we receive 
your requests, they are received 24 hours a day.) 
3.2 Treatment of the Phenomena by the 
Ensemble Model 
These kinds of phenomena can be handled by the En- 
semble Model. As described in Section 1, the En- 
semble Model has syntactic, semantic, and pragmatic 
processing modnles and modules that; do combination 
of some or all of those proeessings to analyze the in- 
put in parallel and independently. Their output is 
unified, and even if some of the modules are unable 
to analyze the input, the other modules output their 
own resnlts. This makes the Ensemble Model robust. 
Moreover, even if some of the modules are nnable to 
analyze the input in real-time, the others output their 
results in real-time. 
'\['he Ensemble Model has been partially imple- 
mented, and Ensemble/Trio-I consists of syntactic, 
semantic, and syntactic-semantic modules, it can 
handle (p2) above as described in detail elsewhere 
(Shimazu et al., 1993b). Phenomena (p3) through 
(p6) can be partly handled by another implemen- 
tation of the Ensemble Model: Ensemble/Quartet- 
1, which has pragmatic processing modnle as well as 
the three modules of Ensemble/'lMo-I. The pragmatic 
processing module uses plan and domain knowledge 
to handle not only well-structured sentences bnt also 
ill-structured sentences, such as those including inver 
sion and omission (Kognre et al., 1994). 
To make the system more robust by enabling the 
syntactic and semantic processing modules to han- 
dle phenomena (pl) and (p3) through (p6), we in- 
corporated Grass-g into those modnles. Grass-J dif- 
fers fl:om Grat-J in two ways: Grass-J has lexieal en- 
tries for expressions peculiar to spontaneous speech, 
so that it can handle (pl). And because sentence 
boundaries are not clear in spontaneous speech, it uses 
tile concept of utterance unit (Shimazu et al,, 1993a) 
instead of sentence. This allows it to handle phenom- 
ena (p3) through (p6). For example, an inverted sen- 
tence can be handled by decomposing it, at the point 
where the inversion occurs, into two utterance units. 
Fig. 3 shown the architecture of Ensemble/Quartet- 
I. Each processing module is based on the bottom- 
up (:hart analysis method (:Kay, 1980) and a disjunc- 
tive feature description unification method ealled con- 
straint projection (Nakano, 1991). The syntactic-. 
semantic processing module uses Grass-J, the syntac- 
tic processing module uses Grass-J without seman- 
tic constraints such as sortal restriction, the seman- 
A: 1 anoo kisoken 
well the Basic Research Labs. 
eno ikileala o desu.ne 
to how to go ACC 
'well, how to go to the Basic Re- 
search Labs.' 
B: 2 hal 
nh-h uh 
:1111-}1/111' 
A: 3 eholto shira nai ride 
well know NOT because 
%ecause l don't know well' 
4 oshie teitadaki tai ndesukedo 
tell IIAV E-A-FAVOR. want 
'I'd like you tell me it' 
Fig. 4: Dialogue I. 
tic processing moduh', uses Crass-.) without syntactic 
constraints such as case information, and the prag- 
matic processing module uses a plan-based grammar. 
4 A GRAMMAR, FOR SPONTANEOUS 
SPEECH 
'\['his section describes Grass-Z 
4.1 Processing Units 
'Sentence' is used as the start symbol in granunars 
for written languages but sentence boundaries are not 
clear in spontaneous speech. ;Sentence' therefore can 
not be used as the start symbol in grammars lbr spon- 
taneous speech. Many studies, though, have shown 
that utterances are composed of short units (I,evelt, 
1989: pp. 23-.24), that need not be sentences in writ- 
ten language. Grass-3 uses such units instead of sen- 
tences. 
Consider, for example, Dialogue 1 in Fig. 4. Ut- 
terances 1 and 3 cannot be regarded as sentences in 
written language. Let us, however, consider 'hal' in 
Utterance 2. It expresses participant B's confirma- 
tion of the contents of Utterance 1. 2 Each utterance 
in Dialogue 1 can thus be considered to be a speech 
act (Shimazu et al., 1993a). These utterances are pro-- 
cessing nulls we call "utlerance units. They are used in 
Grass-J instead of the sentences used in Grat-J. One 
feature of these units is that 'hal' can be. intel\jected 
by the hearer at the end of the unit. 
The boundaries for these units can be determined 
by using pauses, linguistic clues described in the next 
section, syntactic form, and so on. In using syntactic 
\[brm to determine utterance unit boundaries, Crass- 
J first stipulates what an utterance unit actually is. 
This stipulation is based on an investigation of dia- 
logue transcripts, and in the current version of Grass- 
.\], the following syntactic constituents are recognized 
as utterance units. 
• verb phrases (including auxiliary verb phrases 
and adjective phrases) that may be followed by 
=The roles of 'hal', ~tn interjectory response correspond- 
ing to ~ back-channel utterance sueh &s uh-huh in En- 
glish but which occurs more frequently in Japanese di;> 
logue, axe discussed in Shimazu et at. (1993~t) ~tnd I(~tt~tgiri 
(1993). 
1016 
~ed ~ Result 
Fig. 3: Architecture of Ensemble/Quartet-l. 
conjunctive particles and sentence-final particles 
• noml phrases, which may be followed by particles 
• interjections 
• conjunctions 
Grass-J it:chides a bundle of phrase structure rules 
used to derive speech act re.presentation from the logi- 
cal form of these (:onstituents. A Grass-J-based parser 
inputs an utterance unit and outputs the rel)resentt¢ 
lion of the speech act performed by the unit, which is 
then input to the discourse processing system. 
Consider the following simple dialogue. 
A: 1 genkou o 
manuscript ACC 
'The manuscript' 
B: 2 hal 
uh-huh 
'nh-huh ' 
A: 3 okut tekudasai 
send please 
'please send hie' 
The logical form for i)tteranee 1 is ((mannscript *x)), 
so that its resulting speech act representation is 
(l) ((r~,ter %) (agent *e *4 (speaker *4 (ol,ject *,, 
*x) (manuscript *x)):' 
or, as written in usual notation, 
(2) l{el>r(speaker, ?x:manuscript(?x)). 
In the same way, the speech act representation for 
Utterance 3 is 
(3) Request(speaker, hearer, send(hearer, speaker, ry)) 
The discourse processor would find that '?x in (2) is 
the same as ?y in (3). A detailed explanation of this 
discourse processing is beyond the scope of this paper. 
a'liefer' st~nt(Is for the surface referring in Alien a.nd 
Perrault (\] 980). 
4.2 Treatment of Expressions Peculiar to 
Spontaneous Slme, eh 
Classification 
The underlined words in l)iak)gue 1 in Fig. d do not 
normally appear in writLen sentences, We analyzed 
the dialogue transcripts to identify expressions that 
kequently appear in spoken sentences which includes 
spontaneous speech but that do not appear in written 
sentences, and we cleLssitied them as follows. 
1. words plmnologically dif\[erent Dora those in writ- 
ten sentences (words in parenthesis are corre- 
sponding written-sentence words) 
(ex.) 'shinakya' ('shinakereb£, if someone does 
not do), 'shichau' ('shiteshimatf, have done) 
2. fillers (or hesitations such as well in l!;nglish) 
(ex.) 'etto', 'anoo' 
3. particles peculiar to spoken langnage 
(ex.) 'tte', 'nante', %oka' 
4. interjectory particles (words inserted interjecto- 
rily after noun phrases and adverbial/adnominal- 
form verb phrases) 
(ex.) ~llel~ Cdesllne~ :sa ~ 
5. expressions introducing topics 
(ex.) '(ila)lldeSllkedo', '(\[la) i|desukedon,(,', 
'(n a) 12 des uga' 
6. words appearing after main verb phrases 
(these words take l;he sentence-final form of 
verbs/auxiliary verbs/adjectives) 
(ex.) 'yo', 'ne', 'yone', 'keredo', 'kedo', ~kere- 
domo', 'ga', 'kedomo', 'kate' 
Nagata and Kogure (1990) addressed Jai)anese 
sentence-final expressions peculiar to spoken J N)anese 
sentences but (lid not deal with all the spontaneous 
speech expressions listed above. These. expressions 
may be analyzed morphologica.lly (Takeshita & Fukn 
naga, 1991). Because some expressions peculiar to 
spontaneous sl)eecb do not affect the propositiomd 
1017 
content of the sentences, disregarding those expres- 
sions might be a way to process spontaneons speech. 
Such cascaded processing of morphological analysis 
and syntactic and semantic analysis disables the in- 
cremental processing required for real-time dialogue 
understanding. Another approach is to treat these 
kinds of expressions as extra, 'noisy' words. Although 
this can be done by using a robust parsing technique, 
such as the one developed by Mellish (1989), it re- 
quires the sentence to be processed more than two 
times, and is therefore not suitable for real-time dia- 
logue understanding. In Grass-J these expressions are 
handled in the same way as expressions appearing in 
written language, so no special techniqm~.s are needed. 
Words phonologically different froin corre- 
sponding words in written-language 
The words 'tern' and 'ndesu' in 'shit tern ndesu 
ka' (do you know that'?) correspond semantically to 
'teirn' and 'nodesu' in written sentences. We investi- 
gated such words in the dialogue data (Fig. 5). One 
way to handle these words is to translate them into 
their corresponding written-language words, but be- 
cause this requires several steps it is not suitable for 
incremental dialogue processing. We therefore regard 
these words as independent of their corresponding 
words in written-language, even though their lexical 
entries have the same content. 
Fillers 
Fillers such as 'anoo' and 'etto', which roughly cor- 
respond to wellin English, appear fl'equently in spore 
taneous speech (Arita et al., 1993) and do not affect 
the propositional content of sentences in which they 
appear 4 . One way to handle them is to disregard them 
after morphological analysis is completed. As noted 
above, however, such an approach is not suitable for 
dialogue processing. We therefore treat them directly 
in parsing. 
In Grass-J, fillers modify the following words, what- 
ever their grammatical categories are. The feature 
structure for fillers is as follows. 
head \[pos interjection\] 
su beat { } 
adjunct \[ lexical +\] 
adjacent nil 
lexical 4- 
sem \[ restric {}\] 
The value of the feature lexicaI is either + or -: it 
is + in lexical items and - in feature structures for 
phrases colnposed, by phrase structure rules, of sub- 
phrases. Because these words do not affect proposi- 
tional contents, the value of the feature (sere restric) 
is empty. 
For exalnple, let us look at the parse tree for 'etto 
400-yen desu' (well, it's 400 yen). Symbols I (Interjec- 
tion), NP, and VP are abbreviations for the complex 
feature structures. 
4Although Sadanobu and 'TPakubo (1993) investigated 
the discourse management function of fillers, we do not 
discuss it here. 
\[. expressions related to aspects 
teku (teiku in written-language), teru (teiru), chau 
(tesimau), etc. 
2. expressions related to topic marker 'wa' 
cha (tewa), char (tewa), ccha (tewa), .jr (dewa), etc. 
3. expressions related to conjnnetive particle 'ha' 
nakerya (nakereba), nakya (nakereba), etc. 
4. expressions related to formal nouns 
n (no), nmn (nmno), toko (tokoro), etc. 
5. demonstratives 
kocchi (kochira), korya (korewa), so (son), soshi- 
tara (soushitara), sokka (souka), socchi (sochira), son 
(sono), sore.jr (soredewa), sorejaa (soredewa), etc. 
6. expressions related to interrogative pronoun nani 
nanka (nanika), nante (nanito), etc. 
7. other 
mokkai (mouikkai), etc. 
Fig. 5: Words that in spoken language differ from 
corresponding words in written language. 
VP 
NP V 
I 
I N desu 
r 
etto 400-yen 
'Phe filler 'etto' modifies the following word '400- 
yen' and the logical form of the sentence is the same as 
that of '400-yen desn'. 
Particles peculiar to spoken language 
Words such as 2;te' in 'Kyoto tte Osaka no tsugi 
no eki desu yone' (Kyoto is the station next to Osaka, 
isn't it?) work in the same way a~s c~>e-marking/topic- 
marking particles. Because they have no correspond- 
ing words in written language, lexical entries for then, 
are required. '\['hese words do not correspond to any 
specific surface case, such as 'ga' and %'. I,ike t, he 
topic marker 'wa', the semanl ic relationships they ex- 
press depend on the meaning of the phrases they con- 
nect. 
Interjectory particles 
Intmjectory particles, such ~%s 'ne' and 'desune', for 
low noun phrases and adverbial/adnominal-form verb 
phrases, and they do \]lot affect tile meaning of tile 
utterances. The intmjeciory particle 'he' differs from 
the sentence-final particle 'ne' in the sense that the 
latter follows sentence-final form verb phrases. These 
kinds of words can be treated by regarding them as 
particles Ibllowing noun phrases and verbs phrases. 
The following is the feature structure for these words. 
head "1 
subcat { } 
adjunct nil 
adjacent\[ head*\] \] sere \[ index *2\] 
lexical + 
\[index "2 \] sem restric { } 
1018 
The interjectory particles indicate the end of utte> 
ante units; they do not appear in the nliddle of utter- 
ante units. They flmetion ~us, so to Sl)eak, utterane(> 
unit-final t)articles. Therefore, a noun phrase followed 
by an interjectory particle forms a (surface) referring 
speech act in the same. way as noun phrase utter- 
ances, hH;er.jectory particles add nothing to logical 
forms. For example, the speech act representation of 
'genkou o desune' ix the. same as (2) in Section 4.l. 
Expressions introducing topics 
As in Uttermtce 4 of l)ialogne 1, an expression 
such as (,,a)r, des,,k,~do(,,~o) frequently apl,ears in di- 
alogues, especially in the beginning. This expres- 
sion introduces a new topic. One way t.o handle 
an expression such as this is to break it. down into 
na + ndesu + kcdo F m.o. This process, however, pre 
vents the system fronl detecting its role in topic intro- 
(luction. We therefore consider each of these expres- 
sions to be one word. 'l'he reason these expression 
are used is to make a topie explicit, by introdncing a 
discourse referent ('Phomason, 1990). Consequently, 
an 'introduce-topic' speech act is formed. These ex- 
l)ressions indicate the en(I of an utterance unit as an 
interjectory particle. 
Words aplmaring after main verb phrase 
\[t has already been pointed out that; sentence-\[inal 
|)articles, such as 'yo' and 'ne', Dequently app(:ar in 
spoken Japanese sentences (Kawamori, 1991). Con- 
junctive particles, such as qwAo' and 'kara', are also 
used as sentenee-.final pa.rticle.s (\[h)saka et ah, 1991) 
and thc'y m:c treated as such in Grass-J. They perform 
the function of anticipating the heater's reaction, as 
a trial cxt)ression does (Clark &. Wilkes-(\]ibbs, 19!10). 
'\]'hey Mso indicate the end of utterance units. 
5 ANALYSIS EXAMPLES 
Below we show results obtained by using a Grass- 
J-based parser to analyze some of the utterances in 
Dialogue 1. U (J means the utterance refit category. 
® Utterance I: 'anoo kisoken eno ikikata o desune' 
(*veil, how to go to the Basic Research l,al)s.) 
parse tree: 
OO 
NP 
NP P 
NP P desune 
f 
NP N o 
Jf~_ I 
NP P ikikata 
I N eno 
I I 
anoo kisoken 
speech act representation: 
index = *X29 
restriction = 
((REFER *X29) (OBJECT *X29 *X30) 
(AGENT *X29 *X31) (SPEAKER *X31) 
(BASIC-RESEARCH-LABS *X32) 
(DESTINATION *X30 *X32) 
(HOW-TO-GO *X30) ) 
• Utterance 4: 'oshie teitadaki tai ndesukedo' (I'd 
like you to tell me it) 
parsc t, ree: 
OU I 
VP 
VP P 
VP AUXV ndesukedo 
I 
V AUXV tai 
I I 
oshie teitadaki 
speech act representation: 
index = *X777 
restriction = 
( (INTRODUCE-TOPIC *XYYY) 
(OBJECT *XT(7 *X778) 
(AGENT *X777 *xggg) 
(SPEAKER *X779) 
(TELL *X780) 
(AGENT *X780 *X808 
(OBJECT *X780 *X809 
(PATIENT *X780 *XOI0 
(HAVE-A-FAVOR *X784 
(OBJECT *X784 *X780 
(AGENT *X784 *X81J,) 
(SOURCE *X784 *X808 
(WANT *X718) 
(OBJECT *X778 *X784 
(AGENT *X778 *X811)) 
6 CONCLUSION 
We have developed a grammar, called Cras.s-.\], 
for handling distinctive phenomena in spontaneous 
speech. '\['he grammatical analysis of spontaneous 
speech is useful in combildng the fruits of dialogue 
undersl, anding research and those of speech pro 
cessing research. As describ(:d earlier, GrassoJ is 
used as the grammar tbr the experimental systems 
t~;nsemble/rli'io - 1 ancl l'hmembleffQuartetq, which are 
based on the Ensemble Model. It enables the pro- 
cessing of several kinds of spontaneons speech, such 
as that lacking particles. 
We focused on processing ~rans('ripts because a 
grammar and an analysis method for spontaneons 
speech can be combined with speech processing sys- 
tems more accurately than (:art those for written lan- 
guages. 
Finally, a.lthough we {b(;used only on Japanese'. 
spontaneous Sl)eech , mosl, of the techniques described 
in this paper can also be used 1,o analyze spontaneous 
speech in other languages. 
ACKNOWLEDGEMETNS 
We thank Chung Pal l,ing, Yuiko Otsuka, Miyoko 
Sou, Kaeko Matsuzawa, and Sanae Nagata, for help- 
ing us analyze dialogue data. 
1079 
REFERENCES 
Allen, J. F., & Perrault, C. R. (1980). Analyzing 
Intention in Utterances. Artificial Intelligence, 
t5, 143 178. 
Arita, H., Kogure, K., Nogaito, I., Maeda, H., & 
Iida, II. (1993). Media-Dependent Conversation 
Manners. In SIG-NL-t;I, h~jbrmation Process- 
ing Society of Japan. (in Japanese). 
Bear, J., l)owding, ,1., & Shriberg, 1",. (1992). In- 
tegrating Multiple Knowledge Sources for the 
Detection and Correction of Repairs in Haman- 
Computer Dialog. In ACL-92, pp. 56 63. 
Clark, II. H., & Wilkes-Gibbs, D. (1990). Referring 
as a Collaborative Process. In Cohen, P. R., 
Morgan, J., & Pollack, M. E. (Eds.), Intentions 
in Communication, pp. 463- 493. MIT Press. 
Cohen, P. 11,., Morgan, J., & Pollack, M. t!',. (Eds.). 
(1990). Intentions in Communication. MIT 
Press. 
Den, Y. (1993). A Study on Spoken l)ialogue Gram 
mar. SIG-,5%UD-9302-5, Japanese Society of 
AI, 33 .4:0. (in Japanese). 
Gnnji, '1'. (1986). Japanese Phrase Structure Gram- 
mar. Reidel, Dordrecht. 
llayes, P. J., Hauptmann, A. C., Carbonell, J. (k, & 
'\[bmita, M. (1986). Parsing Spoken Language: 
A Semantic Caseframe Approach. in COLING- 
86, pp. 587 592. 
ttosaka, J., Takezawa, T., & Ehara, '1'. (1991). 
Constructing Syntactic Constraints for Speech 
l\],ecognition using Empirical l)ata. In SIG-NL- 
83, Information Processing Society of Japan, 
pp. 97 104. (in Japanese). 
Katagiri, Y. (1993). l)ialogue Coordination Functions 
of Japanese Sentenee--Hnal Particles. In Pro- 
ceedings of \[nterna*ional Symposium on Spoken 
Dialogue, pp. 145 148. 
Kawalnori, M. (1991 ). Japanese Sentence Final Parti- 
cles and Epistemic Modality. SIG-NI;-SL h~foT'- 
marion Processing Society of Japan, 41 48. (in 
Japanese). 
Kay, M. (1980). Algorithm Schemal, a and Data Strnc- 
tnres in Syntactic Processing. Teeh. rep. CSL- 
80-12, Xerox PARC. 
Kogure, K., Shimazu, A., L; Nakano, M. (1994). Phm- 
Based Utterance Understanding. In Proceedings 
of the /~Slh Conference of \[nform.ation Process- 
ing Society of Japan, Vol. 3, pp. 189 190. (in 
a apanese). 
Langer, tI. (1990). Syntactic Normalization of Spon- 
taneous Speech. In COLfNG-gO, pp. 180 183. 
Levelt, W. J. M. (1989). Speaking. MIT Press. 
Mellish, C. (1989). Some Chart-Based Techniques for 
Parsing Ill-Formed Input. In A CL-89, pp. 102 
109. 
Nagata, M., & Kogure, K. (1990). tlPSG-Based Lat- 
tice Parser for Spoken Japanese in a Spoken 
Language Translation System. In 1~;CAI-.90, pp. 
4(it 466. 
Nakano, M. (199\]). Constraint Projection: An I"fffl- 
cient Treatment of Disjunctive Feature Descrip- 
tions. In ACL-gl, pp. 307 314. 
Nakatani, C., & Itirschberg, J. (1993). A Speech-First 
Model for 1%epair Detection and Correction. In 
ACL-93, pp. 46. 53. 
Okada, M. (1991). A Unifieation-(7;rammar-Direeted 
One-Pass Search Algorithm for Parsing Spoken 
Language. \[n Proceedings of IUASSP-9I. 
Otsuka, lI., & Okada, M. (1992). Incremental Elabo- 
ration in Generating Spontaneons Speech. SIG- 
NLC92-/tI, lnslitute of Eleclronics, information 
and Communication L'ngineer.s. (in Japanese). 
Sadanobu, q'., & Takubo, Y. (1993). The Discourse 
Management Function of Fillers a case of "eeto" 
and "ano(o)" . h Proceedings oJ International 
Symposium on Spoken Dialog, pp. 271 274. 
Shieber, S. M. (1986). An Introduction to Unification- 
Based Approaches to Grammar. CSLI Lecture 
Notes Series No. 4. Stanford: CSLI. 
Shimazu, A., Kawamori, M., & Kogure, K. (1993@. 
Analysis of Interjectory Responses in I)ialoguc. 
SIG~NI, C-93-9, Institute of Electronics, In- 
formation and Comunication l,;ngineers. (in 
Japanese). 
Shimazn, A., Kogure, K., b. Nakano, M. (1993b). 
An 1,2xperimental Distributed Natural Language 
Processing System and its Application to Ro- 
bust Processing. in Proceedings of the Sympo- 
sium on lmplementalion of NaluraI Language 
Processing. Institute of Hectronies, Informa- 
tion and Communication Engineers/Japan So- 
ciety for Software Science and Technology. (in 
Japanese). 
Takeshita, A., & Fukmmga, It. (1991). Morphoh)gical 
Analysis for Spokcm l,anguage. In Proceedings of 
the 42nd Con.fcrence of Injbrmation Process*nO 
,5'oc~et9 of Japan, Vol. 3, pp. 5- 6. (in Japanese). 
Thomason, R. iI. (1990). Accommodation, Meaning, 
and lmplicature: Interdisciplinary t)'oundations 
for Pragmatics. In Cohen, P. 1%, Morgan, J., & 
Pollack, M. It;. (Eds.), Intentions in Uommuni- 
cation, pp. 325 1t64. MIT Press. 
Young, S. I%., Hauptmann, A. G., Ward, W. I1., 
Smith, E. T., & Werner, P. (1989). High bevel 
Knowledge Sourees in Usable Spe.ech l{ecogni- 
tion Systems. Comm*lnication of the ACM, ,~2(2), 
183 194. 
1020 
