Dependency Structure Analysis and Sentence Boundary
Detection in Spontaneous Japanese
Kazuya Shitaoka
†
Kiyotaka Uchimoto
‡
Tatsuya Kawahara
†
Hitoshi Isahara
‡
†
School of Informatics,
Kyoto University
Yoshida-honmachi, Sakyo-ku,
Kyoto 606-8501, Japan,
{shitaoka,kawahara}@ar.media.kyoto-u.ac.jp
‡
National Institute of Information
and Communications Technology
3-5 Hikari-dai, Seika-cho, Soraku-gun,
Kyoto 619-0289, Japan,
{uchimoto,isahara}@nict.go.jp
Abstract
This paper describes a project to detect dependen-
cies between Japanese phrasal units called bunsetsus,
and sentence boundaries in a spontaneous speech
corpus. In monologues, the biggest problem with de-
pendency structure analysis is that sentence bound-
aries are ambiguous. In this paper, we propose
two methods for improving the accuracy of sentence
boundary detection in spontaneous Japanese speech:
One is based on statistical machine translation us-
ing dependency information and the other is based
on text chunking using SVM. An F-measure of 84.9
was achieved for the accuracy of sentence bound-
ary detection by using the proposed methods. The
accuracy of dependency structure analysis was also
improved from 75.2% to 77.2% by using automat-
ically detected sentence boundaries. The accuracy
of dependency structure analysis and that of sen-
tence boundary detection were also improved by in-
teractively using both automatically detected depen-
dency structures and sentence boundaries.
1 Introduction
The “Spontaneous Speech: Corpus and Pro-
cessing Technology” project has been sponsor-
ing the construction of a large spontaneous
Japanese speech corpus, Corpus of Spontaneous
Japanese (CSJ) (Maekawa et al., 2000). The
CSJ is the biggest spontaneous speech corpus in
the world, and it is a collection of monologues
and dialogues, the majority being monologues
such as academic presentations. The CSJ in-
cludes transcriptions of speeches as well as audio
recordings. Approximately one tenth of the CSJ
has been manually annotated with information
about morphemes, sentence boundaries, depen-
dency structures, discourse structures, and so
on. The remaining nine tenths of the CSJ
have been annotated semi-automatically. A fu-
ture goal of the project is to extract sentence
boundaries, dependency structures, and dis-
course structures from the remaining transcrip-
tions. This paper focuses on methods for au-
tomatically detecting sentence boundaries and
dependency structures in Japanese spoken text.
In many cases, Japanese dependency struc-
tures are defined in terms of the dependency
relationships between Japanese phrasal units
called bunsetsus. To define dependency rela-
tionships between all bunsetsusinspontaneous
speech, we need to define not only the depen-
dency structures in all sentences but also the
inter-sentential relationships, or, discourse re-
lationships, between the sentences, as depen-
dency relationships between bunsetsus. How-
ever, it is diﬃcult to define and detect discourse
relationships between sentences because of sig-
nificant inconsistencies in human annotations
of discourse structures, especially with regard
to spontaneous speech. We also need to know
intra-sentential dependency structures in order
to use the results of dependency structure anal-
ysis for sentence compaction in automatic text
summarization or case frame acquisition. Be-
cause it is diﬃcult to define discourse relation-
ships between sentences, depending on the ac-
tual application, it is usually enough to define
and detect the dependency structure of each
sentence. Therefore, the CSJ was annotated
with intra-sentential dependency structures for
sentences in the same way this is usually done
for a written text corpus. However, there is
a big diﬀerence between a written text corpus
and a spontaneous speech corpus: In sponta-
neous speech, especially when it is long, sen-
tence boundaries are often ambiguous. In the
CSJ, therefore, sentence boundaries were de-
fined based on clauses whose boundaries were
automatically detected by using surface infor-
mation (Maruyama et al., 2003), and they were
detected manually (Takanashi et al., 2003). Our
definition of sentence boundaries follows the
definition used in the CSJ.
Almost all previous research on Japanese de-
pendency structure analysis dealt with depen-
dency structures in written text (Fujio and Mat-
sumoto, 1998; Haruno et al., 1998; Uchimoto et
al., 1999; Uchimoto et al., 2000; Kudo and Mat-
sumoto, 2000). Although Matsubara and col-
leagues did investigate dependency structures
in spontaneous speech (Matsubara et al., 2002),
the target speech was dialogues where the ut-
terances were short and sentence boundaries
could be easily defined based on turn-taking
data. In contrast, we investigated dependency
structures in spontaneous and long speeches in
the CSJ. The biggest problem in dependency
structure analysis with spontaneous and long
speeches is that sentence boundaries are am-
biguous. Therefore, sentence boundaries should
be detected before or during dependency struc-
ture analysis in order to obtain the dependency
structure of each sentence.
In this paper, we first describe the problems
with dependency structure analysis of sponta-
neous speech. Because the biggest problem is
ambiguous sentence boundaries, we focus on
sentence boundary detection and propose two
methods for improving the accuracy of detec-
tion.
2 Dependency Structure Analysis
and Sentence Boundary Detection
in Spontaneous Japanese
First, let us briefly describe how dependency
structures can be represented in a Japanese sen-
tence. In Japanese sentences, word order is
rather free, and subjects and objects are often
omitted. In languages having such characteris-
tics, the syntactic structure of a sentence is gen-
erally represented by the relationship between
phrasal units, or bunsetsus, based on a depen-
dency grammar. Phrasal units, or bunsetsus,
are minimal linguistic units obtained by seg-
menting a sentence naturally in terms of seman-
tics and phonetics. Each bunsetsu consists of
one or more morphemes. For example, the sen-
tence “tx�lX�2MoM�(kare-wa yukkuri
aruite-iru, He is walking slowly)” can be divided
into three bunsetsus, “tx(kare-wa,he)”,“�l
X�(yukkuri,slowly)”and“2MoM�(aruite-
iru, is walking)”. In this sentence, the first and
second bunsetsus depend on the third one.
There are many diﬀerences between writ-
ten text and spontaneous speech, and there
are problems peculiar to spontaneous speech
in dependency structure analysis and sentence
boundary detection. The following sections de-
scribe some typical problems and our solutions.
2.1 Problems with Dependency
Structure Analysis
Ambiguous sentence boundaries
As described in Section 1, in this study, we
assumed that ambiguous sentence bound-
aries is the biggest problem in dependency
structure analysis of spontaneous speech.
So in this paper, we mainly focus on this
problem and describe our solution to it.
Independent bunsetsus
In spontaneous speech, we sometimes find
that modifiees are missing because utter-
ance planning changes in the middle of the
speech. Also, we sometimes find bunsetsus
whose dependency relationships are useless
for understanding the utterance. These in-
clude fillers such as “Kw�(anoh, well)”
and “fw�(sonoh, well)”, adverbs that
behave like fillers such as “�O(mou)”,
responses such as “xM(hai,yes)”and“
O�(un, yes)”, conjunctions such as “p
(de, and)”, and disfluencies. In these cases,
bunsetsus are assumed to be independent,
and as a result, they have no modifiees in
the CSJ. For example, 14,988 bunsetsusin
188 talks in the CSJ are independent.
We cannot ignore fillers, responses, and
disfluencies because they frequently ap-
pear in spontaneous speech. However,
we can easily detect them by using the
method proposed by Asahara and Mat-
sumoto (Asahara and Matsumoto, 2003).
In this paper, fillers, responses, and disflu-
encies were eliminated before dependency
structure analysis and sentence boundary
detection by using morphological informa-
tion and labels. In the CSJ, fillers and re-
sponses are interjections, and almost all of
them are marked with label (F). Disfluen-
cies are marked with label (D).
In this paper, every independent bunsetsu
was assumed to depend on the next one.
However, practically speaking, indepen-
dent bunsetsus should be correctly detected
as “independent”. This detection is one of
our future goals.
Crossed dependency
In general, dependencies in Japanese writ-
ten text do not cross. In contrast, de-
pendencies in spontaneous speech some-
times do. For example, “\�U(kore-ga,
this)” depends on “
Y`Mq(tadashii-to,is
right)” and “�x(watashi-wa, I)” depends
on “�O(omou, think)” in the sentence “\
�U��x�
Y`Mq��O”, where “�”
denotes a bunsetsu boundary. Therefore,
the two dependencies cross.
However, there are few number of crossed
dependencies in the CSJ: In 188 talks, we
found 689 such dependencies for total of
170,760 bunsetsus. In our experiments,
therefore, we assumed that dependencies
did not cross. Correctly detecting crossed
dependencies is one of our future goals.
Self-correction
We often find self-corrections in sponta-
neous speech. For example, in the 188 talks
in the CSJ there were 2,544 self-corrections.
In the CSJ, self-corrections are represented
as dependency relationships between bun-
setsus, and label D is assigned to them.
Coordination and appositives are also rep-
resented as dependency relationships be-
tween bunsetsus, and labels P and A are
assigned to them, respectively. The defi-
nitions of coordination and appositives fol-
low those of the Kyoto University text cor-
pus (Kurohashi and Nagao, 1997). Both
the labels and the dependencies should
be detected for applications such as au-
tomatic text summarization. However, in
this study, we detected only the dependen-
cies between bunsetsus, and we did it in the
same manner as in previous studies using
written text.
Inversion
Inversion occurs more frequently in spon-
taneousspeechthaninwrittentext. For
example, in the 188 talks in the CSJ there
were 172 inversions. In the CSJ, inver-
sions are represented as dependency rela-
tionships going in the direction from right
to left. In this study, we thought it impor-
tant to detect dependencies, and we man-
ually changed their direction to that from
left to right. The direction of dependency
has been changed to that from left to right.
2.2 Problems with Sentence Boundary
Detection
In spontaneous Japanese speech, sentence
boundaries are ambiguous. In the CSJ, there-
fore, sentence boundaries were defined based
on clauses whose boundaries were automatically
detected using surface information (Maruyama
et al., 2003), and they were detected manually
(Takanashi et al., 2003). Clause boundaries can
be classified into the following three groups.
Absolute boundaries , or sentence bound-
aries in their usual meaning. Such bound-
aries are often indicated by verbs in their
basic form.
Strong boundaries , or points that can be re-
garded as major breaks in utterances and
that can be used for segmentation. Such
boundaries are often indicated by clauses
whose rightmost words are “U(ga, but)”,
or “`(shi, and)”.
Weak boundaries , or points that can
be used for segmentation because they
strongly depend on other clauses. Such
boundaries are often indicated by clauses
whose rightmost words are “wp(node,be-
cause)”, or “h�(tara,if)”.
These three types of boundary diﬀer in the
degree of their syntactic and semantic com-
pleteness and the dependence of their sub-
sequent clauses. Absolute boundaries and
strong boundaries are usually defined as sen-
tence boundaries. However, sentence bound-
aries in the CSJ are diﬀerent from these two
types of clause boundaries, and the accuracy
of rule-based automatic sentence boundary de-
tection in the 188 talks in the CSJ has an F-
measure of approximately 81, which is the ac-
curacy for a closed test. Therefore, we need a
more accurate sentence boundary detection sys-
tem.
Shitaoka et al. (Shitaoka et al., 2002) pro-
posed a method for detecting sentence bound-
aries in spontaneous Japanese speech. Their
definition of sentence boundaries is approxi-
mately the same as that of absolute bound-
aries described above. In this method, sen-
tence boundary candidates are extracted by
character-based pattern matching using pause
duration. However, it is diﬃcult to extract
appropriate candidates by this method be-
cause there is a low correlation between pauses
and the strong and weak boundaries described
above. It is also hard to detect noun-final
clauses by character-based pattern matching.
One method based on machine learning, a
method based on maximum entropy models,
has been proposed by Reynar and Ratnaparkhi
(Reynar and Ratnaparkhi, 2000). However, the
target in their study was written text. This
method cannot readily used for spontaneous
speech because in speech, there are no punc-
tuation marks such as periods. Other features
of utterances should be used to detect sentence
boundaries in spontaneous speech.
3 Approach of Dependency
Structure Analysis and Sentence
Boundary Detection
The outline of the processes is shown in Fig-
ure 1.
0: Morphological
Analysis
1: Sentence Boundary
Detection (Baseline)
3: Dependency Structure
Analysis (Baseline)
2: Sentence Boundary
Detection (SVM)
5: Sentence Boundary
Detection (Language model)
6: Sentence Boundary
Detection (SVM)
7: Dependency Structure
Analysis (Again)
clause
expression
pause
duration
word 3-gram model
pause
duration
clause
expression
word
information
(A)
(B)
word
Information
distance
between 
bunsetsus
(C)
(A) + information of 
dependencies
(B) + information of
dependencies
4: Dependency 
Structure Analysis
Figure 1: Outline of dependency structure anal-
ysis and sentence boundary detection.
3.1 Dependency Structure Analysis
In statistical dependency structure analysis of
Japanese speech, the likelihood of dependency
is represented by a probability estimated by a
dependency probability model.
Given sentence S, let us assume that it is
uniquely divided into n bunsetsus, b
1
,...,b
n
,
and that it is represented as an ordered set of
bunsetsus, B = {b
1
,...,b
n
}.LetD be an or-
dered set of dependencies in the sentence and let
D
i
be a dependency whose modifier is bunsetsu
b
i
(i =1,...,n− 1). Let us also assume that
D = {D
1
,...,D
n−1
}. Statistical dependency
structure analysis finds dependencies that max-
imize probability P(D|S) given sentence S.
The conventional statistical model (Collins,
1996; Fujio and Matsumoto, 1998; Haruno et
al., 1998; Uchimoto et al., 1999) uses only
the relationship between two bunsetsustoes-
timate the probability of dependency, whereas
the model in this study (Uchimoto et al., 2000)
takes into account not only the relationship be-
tween two bunsetsus but also the relationship
between the left bunsetsu and all the bunsetsus
to its right. This model uses more information
than the conventional model.
We implemented this model within a max-
imum entropy modeling framework. The fea-
tures used in the model were basically attributes
of bunsetsus, such as character strings, parts
of speech, and types of inflections, as well as
those that describe the relationships between
bunsetsus, such as the distance between bun-
setsus. Combinations of these features were also
used. To find D
best
, we analyzed the sentences
backwards (from right to left). In the backward
analysis, we can limit the search space eﬀec-
tively by using a beam search. Sentences can
also be analyzed deterministically without great
loss of accuracy (Uchimoto et al., 1999). So we
analyzed a sentence backwards and determinis-
tically.
3.2 Sentence Boundary Detection
Based on Statistical Machine
Translation (Conventional method
(Shitaoka et al., 2002))
The framework for statistical machine trans-
lation is formulated as follows. Given in-
put sequence X, the goal of statistical ma-
chine translation is to find the best output se-
quence, Y , that maximizes conditional proba-
bility P(Y |X):
max
Y
P(Y|X)=max
Y
P(Y )P(X|Y )(1)
The problem of sentence boundary detection
can be reduced to the problem of translat-
ing a sequence of words, X,thatdoesnotin-
clude periods but instead includes pauses into
a sequence of words, Y , that includes peri-
ods. Specifically, in places where a pause
might be converted into a period, which means
P(X|Y ) = 1, the decision whether a period
should be inserted or not is made by comparing
language model scores P(Y
prime
)andP(Y
primeprime
). Here,
the diﬀerence between Y
prime
and Y
primeprime
is in that one
includes a period in a particular place and the
other one does not.
We used a model that uses pause duration
and surface expressions around pauses as trans-
lation model P(X|Y ). We used expressions
around absolute and strong boundaries as de-
scribed in Section 2.2 as surface expressions
around pauses. A pause preceding or follow-
ing surface expressions can be converted into
a period. Specifically, pauses following expres-
sions “q(to)”, “sM(nai)”, and “h(ta)”, and
pauses preceding expression “p(de)”, can be
converted into a period when these pauses are
longer than average. A pause preceding or fol-
lowing other surface expressions can be con-
verted into a period even if its duration is short.
To calculate P(Y ), we used a word 3-gram
model trained with transcriptions in the CSJ.
3.3 Sentence Boundary Detection
Using Dependency Information
(Method 1)
There are three assumptions that should be sat-
isfied by the rightmost bunsetsu in every sen-
tence. In the following, this bunsetsu is referred
to as the target bunsetsu.
(1) One or more bunsetsus depend on the
target bunsetsu. (Figure 2)
Since every bunsetsu depends on another bun-
setsu in the same sentence, the second rightmost
bunsetsu always depends on the rightmost bun-
setsu in any sentence, except in inverted sen-
tences. In inverted sentences in this study, we
changed the direction of all dependencies to that
from left to right.
One or more  
Bunsetsus depend   
Figure 2: One or more bunsetsus depend on
the target bunsetsu.(“|” represents a sentence
boundary.)
(2) There is no bunsetsu that depends
on a bunsetsu beyond the target bunsetsu.
(Figure 3)
Each bunsetsu in a sentence depends on a bun-
setsu in the same sentence.
(3) The probability of the target bun-
setsu is low. (Figure 4)
The target bunsetsu does not depend on any
bunsetsu.
No bunsetsu depend in this way
Figure 3: There is no bunsetsu that depends on
a bunsetsu beyond the target bunsetsu.
This probability should be low
Figure 4: Probability of the target bunsetsu is
low.
Bunsetsus that satisfy assumptions (1)-(3)
are extracted as rightmost bunsetsu candidates
in a sentence. Then, for every point follow-
ing the extracted bunsetsus and for every pause
preceding or following the expressions described
in Section 3.2, a decision is made regarding
whether a period should be inserted or not.
In assumption (2), bunsetsus that depend on a
bunsetsu beyond 50 bunsetsus are ignored be-
cause no such long-distance dependencies were
found in the 188 talks in the CSJ used in our ex-
periments. Bunsetsus whose dependency prob-
ability is very low are also ignored because there
is a high possibility that these bunsetsus’ depen-
dencies are incorrect. Let this threshold proba-
bility be p, and let the threshold probability in
assumption (3) be q. The optimal parameters p
and q are determined by using held-out data.
In this approach, about one third of all
bunsetsu boundaries are extracted as sentence
boundary candidates. So, an output sequence
is selected from all possible conversion patterns
generated using two words to the left and two
words to the right of each sentence boundary
candidate. To perform this operation, we used
a beam search with a width of 10 because a
number of conversion patterns can be generated
with such a search.
3.4 Sentence Boundary Detection
Based on Machine Learning
(Method 2)
We use Support Vector Machine (SVM) as a
machine learning model and we approached the
problem of sentence boundary detection as a
text chunking task. We used YamCha (Kudo
and Matsumoto, 2001) as a text chunker, which
is based on SVM and uses polynomial kernel
functions. To determine the appropriate chunk
label for a target word, YamCha uses two words
to the right and two words to the left of the
target word as statistical features, and it uses
chunk labels that are dynamically assigned to
the two preceding or the two following words
as dynamic features, depending on the analysis
direction. To solve the multi-class problem, we
used pairwise classification. This method gen-
erates N ∗ (N − 1)/2 classifiers for all pairs of
classes, N, and makes a final decision by their
weighted voting.
The features used in our experiments are the
following:
1. Morphological information of the three words
to the right and three words to the left of the
target word, such as character strings, pronun-
ciation, part of speech, type of inflection, and
inflection form
2. Pause duration normalized in terms of Maha-
lanobis distance
3. Clause boundaries
4. Dependency probability of the target bunsetsu
5. The number of bunsetsus that depend on the
target bunsetsu and their dependency proba-
bilities
We used the IOE labeling scheme for proper
chunking, and the following parameters for
YamCha.
• Degree of polynomial kernel: 3rd
• Analysis direction: Left to right
• Multi-class method: Pairwise
4 Experiments and Discussion
In our experiments, we used the transcriptions
of 188 talks in the CSJ. We used 10 talks for
testing. Dependency structure analysis results
were evaluated for closed- and open-test data in
terms of accuracy, which was defined as the per-
centage of correct dependencies out of all depen-
dencies. In Tables 1 to 3, we use words “closed”
and “open” to describe the results obtained for
closed- and open-test data, respectively. Sen-
tence boundary detection results were evaluated
in terms of F-measure.
First, we show the baseline accuracy of depen-
dency structure analysis and sentence boundary
detection. The method described in Section 3.2
was used as a baseline method for sentence
boundary detection (Process 1 in Figure 1). To
train the language model represented by P(Y ),
we used the transcriptions of 178 talks exclud-
ing the test data. The method described in Sec-
tion 3.1 was used as a baseline method for de-
pendency structure analysis. (Process 3 in Fig-
ure 1) As sentence boundaries, we used the re-
sults of the baseline method for sentence bound-
ary detection. We obtained an F-measure of
75.6, a recall of 64.5%, and a precision of 94.2%
for the sentence boundary detection in our ex-
periments. The dependency structure analysis
accuracy was 75.2% for the open data and 80.7%
for the closed data.
The dependency probability of the rightmost
bunsetsus in a given sentence was not calculated
in our model. So, we assumed that the right-
most bunsetsus depended on the next bunsetsu
and that the dependency probability was 0.5
when we used dependency information in the
experiments described in the following sections.
4.1 Sentence Boundary Detection
Results Obtained by Method 1
We evaluated the results obtained by the
method described in Section 3.3. The results
of baseline dependency structure analysis were
used as dependency information (Process 5 in
Figure 1).
First, we investigated the optimal values of
parameters p and q described in Section 3.3 by
using held-out data, which diﬀered from the test
data and consisted of 15 talks. The optimal val-
ues of p and q were, respectively, 0 and 0.9 for
the open-test data, and 0 and 0.8 for the closed-
test data. These values were used in the follow-
ing experiments. The value of p was 0, and these
results show that bunsetsus that depended on a
bunsetsu beyond 50 bunsetsus were ignored as
described in assumption (2) in Section 3.3.
The obtained results are shown in Table 1.
When dependency information was used, the F-
measure increased by approximately 1.4 for the
open-test data and by 2.0 for the closed test
data, respectively. Although the accuracy of de-
pendency structure analysis for closed test data
was about 5.5% higher than that for the open-
test data, the diﬀerence between the accuracies
of sentence boundary detection for the closed-
and open-test data was only about 0.6%. These
results indicate that equivalent accuracies can
be obtained for both open- and closed-test data
in detecting dependencies related to sentence
boundaries.
When all the extracted candidates were con-
sidered as sentence boundaries without us-
ing language models, the accuracy of sentence
boundary detection obtained by using the base-
line method was 68.2%(769/1,127) in recall and
81.5%(769/943) in precision, and that obtained
by using Method 1 was 87.2%(983/1,127) in re-
call and 27.7%(983/3,544) in precision. The re-
sults show that additional 214 sentence bound-
ary candidates were correctly extracted by us-
ing dependency information. However, only
108 sentence boundaries were chosen out of
the 214 candidates when language models were
used. We investigated in detail the points
that were not chosen and found errors in noun-
final clauses, clauses where the rightmost con-
stituents were adjectives or verbs such as “q
�O(it to-omou, think)” or “x�`M(it wa-
muzukashii, diﬃcult)”, and clauses where the
rightmost constituents were “qMOwx(it to-
Table 1: Sentence boundary detection results
obtained by using dependency information.
recall precision F
With dependency 74.1% 82.5% 78.0
information (open) (835/1,127) (835/1,012)
With dependency 74.2% 83.5% 78.6
information (closed) (836/1,127) (836/1,001)
baseline 64.5% 94.2% 76.6
(727/1,127) (727/772)
iu-no-wa, because)” and “q`ox(it to-si-te-
wa, as)”, and so on. Some errors, except for
those in noun-final clauses, could have been cor-
rectly detected if we had had more training
data.
We also found that periods were sometimes
erroneously inserted when preceding expres-
sions were “U(ga, but)”, “�`o(mashite,
and)”, and “Z�r�(keredomo, but)”, which
are typically the rightmost constituents of a sen-
tence, as weel as “o(te, and)”, which is not,
typically, the rightmost constituent of a sen-
tence. The language models were not good at
discriminating between subtle diﬀerences.
4.2 Sentence Boundary Detection
Results Obtained by Method 2
We evaluated the results obtained by the
method described in Section 3.4 (Process 6 in
Figure 1). For training, we used 178 talks ex-
cluding test data.
The results are shown in Table 2. The F-
measure was about 6.9 points higher than that
described in Section 4.1. The results show
that the approach based on machine learning
is more eﬀective than that based on statisti-
cal machine translation. The results also show
that the accuracy of sentence boundary detec-
tion can be increased by using dependency in-
formation in Method 2. However, we found that
the amount of accuracy improvement achieved
by using dependency information depended on
the method used. This may be because other
features used in SVM may provide information
similar to dependency information. For exam-
ple, Feature 1 described in Section 3.4 might
provide information similar to that in Features
4 and 5. Although in our experiments we used
only three words to the right and three words
to the left of the target word, the degradation
in accuracy without dependency information
was slight. This may be because long-distance
dependencies may not be related to sentence
boundaries, or because Feature 5 does not con-
tribute to increasing the accuracy because the
accuracy of dependency structure analysis in de-
tecting long-distance dependencies is not high.
Table 2: Sentence boundary detection results
obtained by using SVM.
recall precision F
With dependency 80.0% 90.3% 84.9
information (open) (902/1,127) (902/999)
With dependency 79.7% 90.5% 84.9
information (closed) (900/1,127) (900/994)
Without 79.3% 90.1% 84.4
dependency information (894/1,127) (894/992)
Table 3: Dependency structure analysis results
obtained with automatically detected sentence
boundaries.
open closed
With results in Section 4.1 75.8% 81.2%
With results in Section 4.2 77.2% 82.5%
Baseline 75.2% 80.7%
4.3 Dependency Structure Analysis
Results
We evaluated the results of dependency struc-
ture analysis obtained when sentence bound-
aries detected automatically by the two meth-
ods described above were used as inputs (Pro-
cess 7 in Figure 1). The results are shown in
Table 3. The accuracy of dependency structure
analysis improved by about 2% when the most
accurate and automatically detected sentence
boundaries were used as inputs. This is be-
cause more sentence boundaries were detected
correctly, and the number of bunsetsusthatde-
pended on those in other sentences decreased.
We investigated the accuracy of dependency
structure analysis when 100% accurate sentence
boundaries were used as inputs. The accuracy
was 80.1% for the open-test data, and 86.1%
for the closed-test data. Even when the sen-
tence boundary detection was perfect, the er-
ror rate was approximately 14% even for the
closed-test data. The accuracy of dependency
structure analysis for spoken text was about 8%
lower than that for written text (newspapers).
We speculate that this is because spoken text
has no punctuation marks and many bunsetsus
depend on others far from them because of in-
sertion structures. These problems need to be
addressed in future studies.
5Conclusion
This paper described a project to detect depen-
dencies between bunsetsus and sentence bound-
aries in a spontaneous speech corpus. It is
more diﬃcult to detect dependency structures
inspontaneousspokenspeechthaninwritten
text. The biggest problem is that sentence
boundaries are ambiguous. We proposed two
methods for improving the accuracy of sentence
boundary detection in spontaneous Japanese
speech. Using these methods, we obtained an
F-measure of 84.9 for the accuracy of sentence
boundary detection. The accuracy of depen-
dency structure analysis was also improved from
75.2% to 77.2% by using automatically detected
sentence boundaries. The accuracy of depen-
dency structure analysis and that of sentence
boundary detection were improved by interac-
tively using automatically detected dependency
information and sentence boundaries.
There are several future directions. In the fu-
ture, we would like to solve the problems that
we found in our experiments. In particular, we
want to reduce the number of errors due to in-
serted structures and solve other problems de-
scribed in Section 2.1.

References

Masayuki Asahara and Yuji Matsumoto. 2003. Filler and
Disfluency Identification Based on Morphological Analysis
and Chunking. In Proceedings of the ISCA & IEEE Work-
shop on Spontaneous Speech Processing and Recognition,
pages 163–166.

Michael Collins. 1996. A New Statistical Parser Based on
Bigram Lexical Dependencies. In Proceedings of the ACL,
pages 184–191.

Masakazu Fujio and Yuji Matsumoto. 1998. Japanese Depen-
dency Structure Analysis based on Lexicalized Statistics.
In Proceedings of the EMNLP, pages 87–96.

Masahiko Haruno, Satoshi Shirai, and Yoshifumi Ooyama.
1998. Using Decision Trees to Construct a Practical
Parser. In Proceedings of the COLING-ACL, pages 505–511.

Taku Kudo and Yuji Matsumoto. 2000. Japanese Depen-
dency Structure Analysis Based on Support Vector Ma-
chines. In Proceedings of the EMLNP, pages 18–25.

Taku Kudo and Yuji Matsumoto. 2001. Chunking with sup-
port vector machines. In Proceedings of the NAACL.

Sadao Kurohashi and Makoto Nagao. 1997. Building a
Japanese Parsed Corpus while Improving the Parsing Sys-
tem. In Proceedings of the NLPRS, pages 451–456.

Kikuo Maekawa, Hanae Koiso, Sadaoki Furui, and Hitoshi
Isahara. 2000. Spontaneous Speech Corpus of Japanese.
In Proceedings of the LREC2000, pages 947–952.

Takehiko Maruyama, Hideki Kashioka, Tadashi Kumano, and
Hideki tanaka. 2003. Rules for Automatic Clause Bound-
ary Detection and Their Evaluation. In Proceedings of
the Nineth Annual Meeting of the Association for Natural
Language proceeding, pages 517–520. (in Japanese).

Shigeki Matsubara, Takahisa Murase, Nobuo Kawaguchi, and
Yasuyoshi Inagaki. 2002. Stochastic Dependency Parsing
of Spontaneous Japanese Spoken Language. In Proceedings
of the COLING2002, pages 640–645.

Jeﬀrey C. Reynar and Adwait Ratnaparkhi. 2000. A Max-
imum Entropy Approach to Identifying Sentence Bound-
aries. In Proceedings of the ANLP, pages 16–19.

Kazuya Shitaoka, Tatsuya Kawahara, and Hiroshi G. Okuno.
2002. Automatic Transformation of Lecture Transcrip-
tion into Document Style using Statistical Framework. In
IPSJ–WGSLP SLP-41-3, pages 17–24. (in Japanese).

Katsuya Takanashi, Takehiko Maruyama, Kiyotaka Uchi-
moto, and Hitoshi Isahara. 2003. Identification of “Sen-
tences” in Spontaneous Japanese — Detection and Mod-
ification of Clause Boundaries —. In Proceedings of the
ISCA & IEEE Workshop on Spontaneous Speech Process-
ing and Recognition, pages 183–186.

Kiyotaka Uchimoto, Satoshi Sekine, and Hitoshi Isahara.
1999. Japanese Dependency Structure Analysis Based on
Maximum Entropy Models. In Proceedings of the EACL,
pages 196–203.

Kiyotaka Uchimoto, Masaki Murata, Satoshi Sekine, and Hi-
toshi Isahara. 2000. Dependency Model Using Posterior
Context. In Proceedings of the IWPT, pages 321–322.
