Construction of Structurally Annotated Spoken Dialogue Corpus
Shingo Kato
Graduate School of Information Science,
Dept. of Information Engineering,
Nagoya University
Furo-cho, Chikusa-ku, Nagoya
gotyan@el.itc.nagoya-u.ac.jp
Shigeki Matsubara
Yukiko Yamaguchi
Nobuo Kawaguchi
Information Technology Center,
Nagoya University
Furo-cho, Chikusa-ku, Nagoya
Abstract
This paper describes the structural an-
notation of a spoken dialogue corpus.
By statistically dealing with the corpus,
the automatic acquisition of dialogue-
structural rules is achieved. The di-
alogue structure is expressed as a bi-
nary tree and 789 dialogues consist-
ing of 8150 utterances in the CIAIR
speech corpus are annotated. To eval-
uate the scalability of the corpus for
creating dialogue-structural rules, a di-
alogue parsing experiment was con-
ducted.
1 Introduction
With the improvement of speech processing tech-
nologies, spoken dialogue systems that appropri-
ately respond to a user’s spontaneous utterances
and cooperatively execute a dialogue are desired.
It is important for cooperative spoken dialogue
systems to understand the intentions of a user’s
utterances, the purpose of the dialogue, and its
achievement state (Litman, 1990). To solve this
issue, several approaches have been so far pro-
posed. One of them is an approach in which the
system expresses the knowledge of the dialogue
with a frame and executes the dialogue accord-
ing to that frame (Goddeau, 1996; Niimi, 2001;
Oku, 2004). However, it is difficult to make a
frame that totally defines the content of the dia-
logue. Additionally, there is a tendency for the
dialogue style to be greatly affected by the frame.
Figure 1: The data collection vehicle(DCV)
In this paper, we describe the construction of
a structurally annotated spoken dialogue corpus.
By statistically dealing with the corpus, we can
achieve the automatic acquisition of dialogue-
structural rules. We suppose that the system can
figure out the state of the dialogue through the in-
cremental building of the dialogue structure.
We use the CIAIR in-car spoken dialogue cor-
pus (Kawaguchi, 2004; Kawaguchi, 2005), and
describe the dialogue structure as a binary tree.
The tree expresses the purpose of partial dia-
logues and the relations between utterances or
partial dialogues. The speaker’s intention tags
were provided in the transcription of the corpus.
We annotated 789 dialogues consisting of 8150
utterances. Due to the advantages of the dialogue-
40
0022 - 01:37:398-01:41:513 F:D:I:C:
(F g16907g17079g16934g16939) [FILLER:well] &(F g16995g17079g17022g17027)
g16909g16903g16922g16903 [delicious] &g16997g16991g17010g17079
g16909g16905g16940g16982g16945 [Udon] &g16997g16993g17028g17070g17033
g16909g5163 [restaurant] &g16997g17050g17014
g11541g16912g16930g16903g16982g16938g16924g16911<SB> [want to go] &g16991g17000g17018g16991g17070g17026g17012g16999<SB>
0023 - 01:42:368-01:49:961 F:O:I:C:
g16946g16903 [well] &g17034g16991
g16918g16945 [this area] &g17006g17033
g12589g16914g16938g16924g16939 [near] &g17020g16998g17002g17026g17012g17027
g11946g11824g4848 [SUWAYA] &g17012g17066g17055
g3490g9395g12094g6572g16911 [“CHIKUSA
HOUGETSU”]&g17020g17002g17008g17046g17079g17005g17023g16999
g16919g16921g16903g16961g16924g16911<SB> [there are ] &g17007g17009g16991g17049g17012g16999<SB>
Figure 2: Transcription of in-car dialogue speech
Discourse act Action Object
Express
(Exp)
Propose
(Pro)
Request
(Req)
Statement
(Sta)
Suggest
(Sug)
Confirm
(Con)
Exhibit
(Exh)
Guide
(Gui)
ReSearch
(ReS)
Reserve
(Rev)
Search
(Sea)
Select
(Sel)
ExhibitDetail
(ExD)
Genre
(Gen)
IntentDetail
(InD)
Parking
(Par)
ParkingInfo
(PaI)
RequestDetail
(ReD)
ReserveInfo
(ReI)
SearchResult
(SeR)
SelectDetail
(SeD)
Shop
(Sho)
ShopInfo
(ShI)
Figure 3: A part of the LIT
structural rules being represented by context free
grammars, we were able to use an existing tech-
nique for natural language processing to reduce
the annotation burden.
In section 2, we explain the CIAIR in-car spo-
ken dialogue corpus and the speaker’s intention
tags. In sections 3 and 4, we discuss the design
policy of a structurally annotated spoken dialogue
corpus and the construction of the corpus. In sec-
tion 5, we evaluate the corpus.
2 Spoken Dialogue Corpus with Layered
Intention Tags
The Center for Integrated Acoustic Information
Research (CIAIR), Nagoya University, has been
compiling a database of in-car speech and di-
alogue since 1999, in order to achieve robust
spoken dialogue systems in actual usage envi-
ronments (Kawaguchi, 2004; Kawaguchi, 2005)}
This corpus has been recorded using more than
800 subjects. Each subject had conversations with
three types of dialogue system: a human operator,
the Wizard of OZ system, and the conversational
system.
In this project, a system was specially built in
a Data Collection Vehicle (DCV), shown in Fig-
ure 1, and was used for the synchronous recording
of multi-channel audio data, multi-channel video
data, and vehicle related data. All dialogue data
were transcribed according to transcription stan-
dards in compliance with CSJ (Corpus of Spon-
taneous Japanese) (Maekawa, 2000) and were as-
signed discourse tags such as fillers, hesitations,
and slips. An example of a transcript is shown in
Figure 2. Utterances were divided into utterance
units by a pause of 200 ms or more.
These dialogues are annotated by speech act
tags called Layered Intention Tags (LIT) (Irie,
2004(a)), which indicate the intentions of the
speaker’s utterances. LIT consists of four layers:
“Discourse act”, “Action”, “Object”, and “Argu-
ment”. Figure 3 shows a part of the organization
of LIT. As Figure 3 shows, the lower layered in-
tention tag depends on the upper layered one. In
principle, one LIT is given to one utterance unit.
35,421 utterance units have been tagged by hand
(Irie, 2004(a)).
In this research, we use parts of the restau-
rant guide dialogues between a driver and a hu-
man operator. An example of the dialogue cor-
pus with LIT is shown in Table 1. In the col-
umn called Speaker, “D” means a driver’s ut-
terance and “O” means an operator’s one. We
used the Discourse act, Action, and Object layers
and extended them with speaker symbols such as
“D+Request+Search+Shop”. There are 41 types
of extended LIT. Because the “Argument” layer
is too detailed to express the dialogue structure,
we omitted it.
41
Table 1: Example of the dialogue corpus with LIT
Utterance LIT
Number Speaker Transcription First layer Second layer Third layer
(Discourse Act) (Action) (Object)
277 D kono hen de tai ga tabera reru tokoro nai
kana.
Request Search Shop
(I’d like to eat some sea bream.)
278 O hai. Statement Exhibit IntentDetail
(Let me see.)
279 O o ryori wa donna o ryouri ga yorosi katta
desuka.
Request Select Genre
(Which kind do you like?)
280 D nama kei ga ii kana. Statement Select Genre
(Fresh and roe.)
281 D Nabe ga tabe tai desu. Statement Select Genre
(I want to have a Hotpot.)
282 O hai kono tikaku desu to tyankonabe to oden
kaiseki ato syabusyabu nado ga gozai masu
ga.
Statement Exhibit SearchResult
(Well, there are restaurants near here that
serve sumo wrestler’s stew, Japanese hot-
pot, and sliced beef boiled with vegetables.)
283 D oden kaiseki ga ii. Statement Select Genre
(I love Japanese Hotpot.)
284 O hai sou simasu to “MARU” to iu omise ni
nari masu ga.
Statement Exhibit SearchResult
(“MARU” restaurant is suitable.)
285 O yorosi katta de syou ka. Request Exhibit IntentDetail
(How about this?)
286 D yoyaku wa hituyou ari masu ka. Request Exhibit ShopInfo
(Should I make a reservation?)
287 O a yoyaku no hou wa yoyoku sare naku temo
o mise ni wa hairu koto ga deki masu ga.
Statement Exhibit ShopInfo
(No, a reservation is not necessary.)
288 D a zya soko made annai onegai si masu. Request Guide Shop
(I see. Please guide me there.)
289 O kasikomari masi ta. Statement Exhibit IntentDetail
(Sure.)
290 O sore dewa “MARU” made go annnai itasi
masu.
Express Guide Shop
(Now, I’m navigating to “MARU”)
291 D hai. Statement Exhibit IntentDetail
(Thanks.)
3 Description of Dialogue Structure
3.1 Dialogue structure
In this research, we assume that the fundamental
unit of a dialogue is an utterance to which one LIT
is given. To make the structural analysis of the
dialogue more efficient, we express the dialogue
structure as a binary tree. We defined a category
called POD (Part-Of-Dialogue), according to the
observations of the restaurant guide task, that was
especially focused on what subject was dealt with.
As a result of this, 11 types of POD were built
(Table 2). Each node of a structural tree is labeled
with a POD or LIT. The dialogue structural tree
of Table 1 is shown in Figure 4.
3.2 The design policy of dialogue structure
To consider a dialogue as an LIT sequence, LIT
providing process (Irie, 2004(b)) usually should
be done. Furthermore, repairs and corrections are
eliminated because they do not provide LIT. In
this research, we used an LIT sequence provided
in the corpus. After that, the annotation of the
dialogue structure was done in the following way.
Merging utterances: When two adjoining utter-
ances such as request and answer, they seem
to be able to pair up and merge with an
42
appropriate POD. In Table 1, for example,
the utterance “Should I make a reservation?”
(#286) is a request and the answer to #286 is
“No, a reservation is not necessary”(#287).
In this way, utterances are combined with the
POD “S INFO”.
When the LIT’s of two adjacent utter-
ances are corresponding, these utterances
are supposed to be paired and merged with
the same LIT. Utterance “Fresh and roe”
(#280) and “I want to have Hotpot” (#281)
are related to choosing the style of restau-
rant and are provided with the same LIT.
Therefore they are combined with the LIT
“D+Statement+Select+Genre”.
Merging partial dialogues: When two adjoin-
ing partial dialogues (i.e. a partial tree) are
composing another partial dialogue, they are
merged with a proper POD. In Table 1, for
example, a search dialogue (from #277 to
#285, SRCH) and a shop information dia-
logue helping search (from #286 to #287,
S INFO) are combined and labeled as the
POD “SLCT”.
When the POD’s of two adjacent partial di-
alogues are corresponding, these dialogues
are merged with the same POD. Two search
dialogues (one is from #277 to #282, other
is from #283 to #285) are combined with the
same POD “SRCH”.
The root of the tree: The POD of the root of the
tree is “GUIDE”, because the domain of the
corpus is restaurant guide task.
4 Construction of Structurally
Annotated Spoken Dialogue Corpus
4.1 Work environment and procedures
We made a dialogue parser as a supportive envi-
ronment for annotating dialogue structures.
Applying the dialogue-structural rules, which
are obtained from annotated structural trees (like
Figure 4.), the parser analyzes the inputs of
the LIT sequences and the outputs off all avail-
able dialogue-structural trees. An annotator then
chooses the correct tree from the outputs. When
Table 2: Type and substance of POD’s
POD Substance
GENRE choosing style of cuisine.
GUIDE guidance to restaurant or parking.
P INFO extracting parking information such
as vacant space, neighborhood.
P SRCH searching for a parking space.
S INFO extracting shop information such as
price, reservation, menu, area, fixed
holiday.
SLCT selecting a restaurant or parking
space.
SRCH searching for a restaurant.
SRCH RQST requesting a search.
RSRV making a reservation.
RSRV DTL extracting reservation information
such as time, number of people, etc.
RSRV RQST requesting a reservation.
the outputs don’t include the correct tree, the an-
notator should rectify the wrong tree rewriting the
list form of the tree. In this way, we make the an-
notation more efficient.
The dialogue parser was implemented using the
bottom-up chart parsing (Kay, 1980). The struc-
tural rules were extracted from all annotated di-
alogues. In the environment outlined above, we
have worked at bootstrap building. That is, we
1. outputed the dialogue structures through the
parser.
2. chose and rectified the dialogue structure us-
ing an annotator.
3. extracted some structural rules from some
dialogue-structural trees.
We repeated these procedures and increased the
structural rules incrementally, so that the dialogue
parser improved it’s operational performance.
4.2 Structurally annotated dialogue corpus
We built a structurally annotated dialogue corpus
in the environment described in Section 4.1, us-
ing the restaurant guide dialogues in the CIAIR
corpus. The corpus includes 789 dialogues con-
sisting of 8150 utterances. One dialogue is com-
posed of 11.61 utterances. Table 3 shows them in
detail.
43
g15003
g15014
g15014
g15003
g15014
g15003
g15014
g15014
g15003
g15014
g15003
g15003
g15014
g15014
g15003
Speaker
g15019g15039g15032g15045g15042g15050g14981
Now, I'm navigating to "MARU"
Sure.
I see. Please guide me there.
No, reservation is not necessary.
Should I make a reservation?
How about this?
"MARU“ restaurant is suitable.
I love Japanese Hotpot.
Well, there are restaurant near hear 
that serve sumo wrestler's stew, 
Japanese hotchpotch and sliced beef 
boiled with vegetables.
I want to have Hotpot.
Fresh and row.
Which kind do you like?
Let me see.
I'd like to eat sea bream.
Description
g14985g14992g14984
g14985g14992g14983
g14985g14991g14992
g14985g14991g14991
g14985g14991g14990
g14985g14991g14989
g14985g14991g14988
g14985g14991g14987
g14985g14991g14986
g14985g14991g14985
g14985g14991g14984
g14985g14991g14983
g14985g14990g14992
g14985g14990g14991
g14985g14990g14990
g15020g15051g15051g15036g15049g15032g15045g15034g15036
g15013g15052g15044g15033g15036g15049
D+Sta+Exh+InD
D+Req+Sea+Sho
O+Sta+Exh+InD
O+Req+Sel+Gen
D+Sta+Sel+Gen
D+Sta+Sel+Gen
O+Sta+Exh+SeR
D+Sta+Sel+Gen
O+Sta+Exh+SeR
O+Req+Exh+InD
D+Req+Exh+ShI
O+Sta+Exh+ShI
D+Req+Gui+Sho
O+Sta+Exh+InD
O+Exp+Gui+Sho
D+Req+Sea+Sho
D+Sta+Sel+Gen
GENRE
SRCH_
RQST
SRCH
O+Sta+Exh+Sea
SRCH
SRCH
S_INFO
SLCT
SLCT
O+Exp+Gui+Sho
D+Req+Gui+Sho
GUIDEg15906SLCT  O+Exp+Gui+Sho GENREg15906O+Req+Sel+Gen D+Sta+Sel+Gen
SLCTg15906SLCT  D+Req+Gui+Sho S_INFOg15906D+Req+Exh+ShI O+Sta+Exh+ShI
SLCTg15906SRCH  S_INFO D+Sta+Sel+Geng15906D+Sta+Sel+Gen D+Sta+Sel+Gen
SRCHg15906SRCH  SRCH D+Req+Gui+Shog15906D+Req+Gui+Sho O+Sta+Exh+InD
SRCHg15906SRCH_RQST  O+Sta+Exh+SeR D+Req+Sea+Shog15906D+Req+Sea+Sho O+Sta+Exh+InD
SRCH_RQSTg15906D+Req+Sea+Sho GENRE O+Exp+Gui+Shog15906O+Exp+Gui+Sho D+Sta+Exh+InD
D+Sta+Exh+SeRg15906O+Sta+Exh+SeR O+Re+Exh+InD
Figure
4:
Dialogue-structural
tree
and
rules
for
Table
1
44
Table 3: Corpus statistics
number of dialogues 789
number of utterances 8150
number of structural rules 297
utterances per one dialogue 11.61
number of dialogue-structural tree types 659
number of LIT sequence types 657
5 Evaluation of Structurally Annotated
Dialogue Corpus
To evaluate the scalability of the corpus for creat-
ing dialogue-structural rules, a dialogue parsing
experiment was conducted. In the experiment,
all 789 dialogues were divided into two data sets.
One of them is the test data consists of 100 dia-
logues and the other is the training data consists
of 689 dialogues. Furthermore, the training data
were divided into 10 training sets.
By increasing the training data sets, we ex-
tracted the probabilistic structural-rules from each
data. We then parsed the test data using the rules
and ranked their results by probability.
In the evaluation, the coverage rate, the correct
rate, and the N-best correct rate were used.
Coverage rate BP BWD4CPD6D7CTCS
BW
D8CTD7D8
Correct rate BP
BW
CRD3D6D6CTCRD8
BW
D8CTD7D8
N-best correct rate BP
BW
D2CQCTD7D8
BW
D8CTD7D8
BW
D4CPD6D7CTCS
BP Number of the dialogues which can be parsed
BW
CRD3D6D6CTCRD8
BP Number of the dialogues which include the cor-
rect tree in their parse trees
BW
D2CQCTD7D8
BP Number of the dialogues which include the cor-
rect tree in their n-best parse trees
BW
D8CTD7D8
BP Number of the dialogues in the test data
The results of the evaluation of the coverage
rate and the correct rate are shown in Figure 5.
The correct rates for each of the training sets,
ranked from 1-best to 10-best, are shown in Fig-
ure 6.
In Figure 5, both the coverage rate and the cor-
rect rate improved as the training data was in-
creased. The coverage rate of the training set con-
sisting of 689 dialogues was 92%. This means
g14983g14981g14992g14985
g14983g14981g14991g14989
g14983g14981g14987g14983
g14983g14981g14988g14983
g14983g14981g14989g14983
g14983g14981g14990g14983
g14983g14981g14991g14983
g14983g14981g14992g14983
g14984g14981g14983g14983
g14990g14983 g14984g14987g14983 g14985g14984g14983 g14985g14991g14983 g14986g14988g14983 g14987g14985g14983 g14987g14992g14983 g14988g14989g14983 g14989g14986g14983 g14989g14991g14992g15018g15040g15057g15036g14967g15046g15037g14967g15051g15049g15032g15040g15045g15040g15045g15038g14967g15035g15032g15051g15032
g15002g15046
g15053g15036
g15049g15032g15038g15036
g14979g14967g15002g15046
g15049g15049g15036
g15034g15051g14967
g15049g15032g15051
g15036
g14983
g14984g14983
g14985g14983
g14986g14983
g14987g14983
g14988g14983
g14989g14983
g15013g15052
g15044g15033
g15036g15049g14967
g15046g15037g14967
g15032g15053
g15036g15049g15032
g15038g15036
g14967g15047g15032
g15049g15050g15036
g14967g15051g15049
g15036g15036
g15050
g15002g15046g15053g15036g15049g15032g15038g15036g14967g15049g15032g15051g15036
g15002g15046g15049g15049g15036g15034g15051g14967g15049g15032g15051g15036
g15013g15052g15044g15033g15036g15049g14967g15046g15037g14967g15032g15053g15036g15049g15032g15038g15036
g15047g15032g15049g15050g15036g14967g15051g15049g15036g15036g15050
Figure 5: The relation between the size of training data
and coverage and correct rate.
g14983g14981g14991g14983
g14983g14981g14986g14988
g14983g14981g14987g14983
g14983g14981g14987g14988
g14983g14981g14988g14983
g14983g14981g14988g14988
g14983g14981g14989g14983
g14983g14981g14989g14988
g14983g14981g14990g14983
g14983g14981g14990g14988
g14983g14981g14991g14983
g14983g14981g14991g14988
g14984 g14985 g14986 g14987 g14988 g14989 g14990 g14991 g14992 g14984g14983g15013g14980g15033g15036g15050g15051
g15013g14980
g15033g15036
g15050g15051g14967
g15034g15046
g15049g15049g15036
g15034g15051
g14967g15049g15032
g15051g15036
g15003g15032g15051g15032g14967g15050g15040g15057g15036g14967g14990g14983
g15003g15032g15051g15032g14967g15050g15040g15057g15036g14967g14985g14984g14983
g15003g15032g15051g15032g14967g15050g15040g15057g15036g14967g14986g14988g14983
g15003g15032g15051g15032g14967g15050g15040g15057g15036g14967g14987g14992g14983
g15003g15032g15051g15032g14967g15050g15040g15057g15036g14967g14989g14986g14983
g15003g15032g15051g15032g14967g15050g15040g15057g15036g14967g14989g14991g14992
Figure 6: The relation between the size of training data
and the n-best correct rate.
that the rules that were from the training set en-
abled the parsing of a wide variety dialogues. The
fact the correct rate was 86% shows that, using the
rules, the correct structures can be built for a large
number of dialogues.
Three in eight failure dialogues had continued
after a guidance for a restaurant. Therefore, we
assume that offering guidance to a restaurant is a
termination of the dialogue, in which case they
couldn’t be analyzed. Another three dialogues
couldn’t be analyzed because they included some
LIT which rarely appeared in the training data.
The cause of failure in the other two dialogues is
that an utterance that should be combined with its
adjoining utterance is abbreviated.
Figure 6 shows that the 10-best correct rate for
45
the training set consisting of 689 dialogues was
80%. Therefore the correct rate is 86%, and ap-
proximately 93% (80/86) of the dialogues that can
be correctly analyzed include the correct tree in
their top-10. According to Figure 5, the number
of average parse trees increased with the growth
of the training data. However, most of the di-
alogues that can be analyzed correctly are sup-
posed to include the correct tree in their top-10.
Therefore, it is enough to refer to the top-10 in a
situation where the correct one should be chosen
from the set of candidates, such as in the speech
prediction and the dialogue control. As a result,
the high-speed processing is achieved.
6 Conclusion
In this paper, we described the construction of
a structurally annotated spoken dialogue corpus.
From observating the restaurant guide dialogues,
we designed the policy of the dialogue structure
and annotated 789 dialogues consisting of 8150
utterances. Furthermore, we have evaluated the
scalability of the corpus for creating dialogue-
structural rules.
We now introduce the application field of the
structurally annotated dialogue corpus.
Discourse analysis: Using a POD labeled infor-
mation for each partial structure of the dia-
logue, we can obtain information such as the
structure of the domain, the user’s tasks, the
dialogue formats, etc.
Speech prediction and dialogue control: A
system builds the structure of an input up to
date and extracts the dialogue example that
is most similar to the structure of the input
from the corpus. If the next utterance or LIT
of the extracted dialogue is the user’s, the
system waits for the user’s utterance and
predicts its meaning and intention. If the
system’s utterance is next, the system uses
the utterance or LIT to control the dialogue.
At the present time, we have run up the data of the
corpus and built probabilistic dialogue-structural
trees. Next, we will apply the trees to some com-
ponents of the spoken dialogue systems such as
speech prediction and dialogue control.
Acknowledgments
The authors would like to thank Ms. Yuki Irie for
her valuable comments about the design of the di-
alogue structure. This research was partially sup-
ported by the Grant-in-Aid for Scientific Research
(No. 15300045) of JSPS.
References
David Goddeau, Helen Meng, Joe Poliformi,
Stephanie Seneff, and Senis Busayapongchai: A
form-based dialogue manager for spoken language
applications, Proc. of ICSLP’96, pp.701-704, 1996.
Diane J. Litman and James F. Allen : Discourse Pro-
cessing and Commonsense Plans. Phillip R. Cohen,
Jerry Morgan, Martha E. Pollack, editors. Intentions
in Communication. pp.365-388, MIT Press, Cam-
bridge, MA, 1990.
Kikuo Maekawa, Hanae Koiso, Sadaoki Furui, and
Hitoshi Isahara: Spontaneous speech corpus of
Japanese, LREC-2000, pp.947-952, 2000.
Martin Kay: Algorithm Schemata and Data Struc-
tures in Syntactic Processing, TR CSL-80-12, Xe-
rox PARC, 1980.
Nobuo Kawaguchi, Kazuya Takeda, and Fumitada
Itakura: Multimedia corpus of in-car speech com-
munication. J. VLSI Signal Processing, vol.36,
no.2, pp.153-159, 2004.
Nobuo Kawaguchi, Shigeki Matsubara, Kazuya
Takeda, and Fumitada Itakura: CIAIR In-Car
Speech Corpus -Influence of Driving States-. IE-
ICE Trans. on Information and System, E88-D(3),
pp.578-582, 2005.
Tomoki Oku, Takuya Nishimoto, Masahiro Araki,
and Yasuhisa Niimi: A Task-Independent Control
Method for Spoken Dialogs, Systems and Comput-
ers in Japan, Vol.35, No.14, 2004.
Yasuhisa Niimi, Tomoki Oku, Takuya Nishimoto, and
Masahiro Araki: A rule based approach to extrac-
tion of topic and dialog acts in a spoken dialog sys-
tem, Proc. of EUROSPEECH2001, vol.3, pp.2185-
2188, 2001.
46
Yuki Irie, Shigeki Matsubara, Nobuo Kawaguchi,
Yukiko Yamaguchi, and Yasuyoshi Inagaki: Design
and Evaluation of Layered Intention Tag for In-Car
Speech Corpus, Proc. of the INTERNATIONAL
SYMPOSIUM ON SPEECH TECHNOLOGY
AND PROCESSING SYSTEMS iSTEPS-2004,
pp.82-86, 2004.
Yuki Irie, Shigeki Matsubara, Nobuo Kawaguchi,
Yukiko Yamaguchi, and Yasuyoshi Inagaki: Speech
Intention Understanding based on Decision Tree
Learning, Proceedings of 8th International Confer-
ence on Spoken Language Processing, Cheju, Ko-
rea, 2004.
47
