Dependency-based Sentence Alignment for Multiple Document
Summarization
Tsutomu HIRAO and Jun SUZUKI and Hideki ISOZAKI and Eisaku MAEDA
NTT Communication Science Laboratories, NTT Corp.
2-4 Hikaridai, Seika-cho, Soraku-gun, Kyoto, 619-0237 Japan
a0 hirao,jun,isozaki,maeda
a1 @cslab.kecl.ntt.co.jp
Abstract
In this paper, we describe a method of automatic
sentence alignment for building extracts from ab-
stracts in automatic summarization research. Our
method is based on two steps. First, we introduce
the “dependency tree path” (DTP). Next, we calcu-
late the similarity between DTPs based on the ESK
(Extended String Subsequence Kernel), which con-
siders sequential patterns. By using these proce-
dures, we can derive one-to-many or many-to-one
correspondences among sentences. Experiments us-
ing different similarity measures show that DTP
consistently improves the alignment accuracy and
that ESK gives the best performance.
1 Introduction
Many researchers who study automatic summariza-
tion want to create systems that generate abstracts
of documents rather than extracts. We can gener-
ate an abstract by utilizing various methods, such
as sentence compaction, sentence combination, and
paraphrasing. In order to implement and evalu-
ate these techniques, we need large-scale corpora
in which the original sentences are aligned with
summary sentences. These corpora are useful for
training and evaluating sentence extraction systems.
However, it is costly to create these corpora.
Figure 1 shows an example of summary sentences
and original sentences from TSC-2 (Text Summa-
rization Challenge 2) multiple document summa-
rization data (Okumura et al., 2003). From this ex-
ample, we can see many-to-many correspondences.
For instance, summary sentence (A) consists of a
part of source sentence (A). Summary sentence (B)
consists of parts of source sentences (A), (B), and
(C). It is clear that the correspondence among the
sentences is very complex. Therefore, robust and
accurate alignment is essential.
In order to achieve such alignment, we need not
only syntactic information but also semantic infor-
mation. Therefore, we combine two methods. First,
we introduce the “dependency tree path” (DTP) for
Source(A): a2a4a3a6a5a8a7a10a9a12a11a12a13a15a14a17a16a19a18a21a20a21a22a15a23a25a24a12a26a10a27a19a28a30a29a15a31a33a32
a34a33a35a4a36
a2a25a37a6a38a15a39a12a40a19a20a42a41a19a43a45a44 a46 a47 a48 a49 a50 a51 a52 a53 a48 a54 a55
a56 a57 a58 a59 a60 a61 a62a63 a64 a65 a66 a67 a68 a69 a70 a71 a72 a73 a74 a75 a76 a77 a78
a79 a80 a81
a76
a48 a49 a50 a51 a82 a83 a84 a85 a86 a87 a88 a89 a90 a91a93a92a6a5a95a94a95a96a98a97
a99 a100 a101 a102
a84 a103 a104a25a105a15a106a8a107a17a108a110a109a112a111a33a113a114a109a116a115a118a117a21a119a42a11a12a120a12a121a12a122a98a20
a123a8a124a8a125a45a126a128a127a12a129a95a130a21a131a133a132
a20a42a134 a135 a87 a88 a136 a137 a138
a57 a78a140a139
a43a10a141a143a142
First, we stop the new investment of 64-Mega bit
memory from competitive companies, such as in
Korea or Taiwan, and we begin the investment
for development of valuable system-on-chip or
256-Mega bit DRAM from now on.
Source(B): a2a42a144a145a5a42a146 a147 a148 a136 a149a30a150a8a151a133a152a4a153a95a154a25a20a156a155a158a157a6a159 a160 a161 a162 a163
a72 a164
a84 a165
a62
a149 a166 a82 a163
a58 a59 a60 a61
a153a95a154a10a20a128a167a12a23a118a168a133a169a170a28a12a92
a139
a43a98a141a172a171a4a173a175a174a45a142
On a long-term target, we plan to reduce the rate of
general-purpose semiconductor enterprises that pro-
duce DRAM for personal computers.
Source(C): a94a4a96a42a5a177a176a21a178a95a16 a58 a59 a60 a61 a149 a179 a180 a48 a51 a181 a51 a182 a183 a184 a185
a186 a187
a171a4a173a175a174a45a142
From now on, we will be supplied with DRAM from
Taiwan.
Summary(A): a120a128a121a21a122 a126a30a188a10a189a191a190a118a125a128a192a158a126a45a127a12a193 a117a17a194 a129a42a130a42a131a158a132a196a195a198a197
a199a95a200a95a201a95a202a4a203a156a35
a176a8a178a33a16a8a204a95a205a133a206a25a207a15a208a33a92a140a209a175a210a145a211a25a207
a126a145a188a93a189a177a190a6a212
a213
a20a158a214a8a215a95a216a95a217a17a168a21a218a133a219a95a142
We stopped the new investment of 64-Mega bit DRAM.
Summary(B): a220a128a221a21a29a21a222a33a20a145a41a33a43a45a223a158a224a21a216a4a217a33a225a42a105a95a226a98a194a227a92a191a5a30a228a21a229a15a230
a225a42a97a8a231a12a232a21a233a133a113a158a234a235a16a235a18a158a20a158a236a21a234
a129a21a130a4a131a158a132
a97a45a13a21a14
a126a145a190a8a237a4a190
a27
a28a140a238a95a239a12a240a118a37a241a142
We begin the investment for valuable development and
will be supplied with general-purpose DRAMs for per-
sonal computers from Taiwan in the long run.
Figure 1: An example of summary sentences and
their source sentences from TSC-2 multiple docu-
ment summarization data. Underlined strings are
used in summary sentences.
syntactic information. Second, we introduce the
“Extended String Subsequence Kernel” (ESK) for
semantic information.
Experimental results using different similarity
measures show that DTP consistently improves
alignment accuracy and ESK enhances the perfor-
mance.
Sentence 1: a242
a35a158a243a8a244
a20a158a245a12a246a17a225a6a247a248a92a140a249a175a168a30a250a156a251a8a144a42a252
watashi ga kinjo no keisatsu ni otosimono wo
todoke ta.
Sentence 2: a243a95a244 a20a158a245a12a246a17a225a145a247a98a92a140a249a175a168a6a242
a35
a250a156a251a8a144a42a252
kinjo no keisatsu ni otoshimono wo watashi ga
todoke ta.
Figure 2: Examples of sentences that have the same
meaning.
2 Related Work
Several methods have been proposed to realize au-
tomatic alignment between abstracts and sentences
in source documents.
Banko et al. (1999) proposed a method based
on sentence similarity using bag-of-words (BOW)
representation. For each sentence in the given ab-
stract, the corresponding source sentence is deter-
mined by combing the similarity score and heuristic
rules. However, it is known that bag-of-words rep-
resentation is not optimal for short texts like single
sentences (Suzuki et al., 2003).
Marcu (1999) regards a sentence as a set of
“units” that correspond to clauses and defines sim-
ilarity between units based on BOW representa-
tion. Next, the best source sentences are extracted
in terms of “unit” similarity. Jing et al. (Jing and
McKeown, 1999) proposed bigram-based similarity
using the Hidden Markov Model. Barzilay (Barzi-
lay and Elhadad, 2003) combines edit distance and
context information around sentences. However,
these three methods tend to be strongly influenced
by word order. When the summary sentence and
the source sentences disagree in terms of word or-
der, the methods fail to work well.
The supervised learning-based method called
SimFinder was proposed by Hatzivassiloglou et al.
(Hatzivassiloglou et al., 1999; Hatzivassiloglou et
al., 2001). They translate a sentence into a feature
vector based on word counts and proper nouns, and
so on, and then sentence pairs are classified into
“similar” or not. Their approach is effective when
a lot of training data is available. However, the hu-
man cost of making this training data cannot be dis-
regarded.
3 An Alignment Method based on Syntax
and Semantics
For example, Figure 2 shows two sentences that
have different word order but the same meaning.
The English translation is “I took the lost article to
the neighborhood police.”
 a253a45a254a191a255a1a0 (took)
todoke ta
    a2  a3  (I)
watashi ga
   a4a6a5  a7  (to the police)
keisatsu ni
    a8a10a9a12a11  a13  (the lost article)
     otoshimono wo
a14a16a15  
a17  (neighborhood)
kinjo no
Figure 3: An example of a dependency tree.
Since conventional techniques other than BOW
are strongly influenced by word order, they are frag-
ile when word order is damaged.
3.1 Dependency Tree Path (DTP)
When we unify two sentences, some elements be-
come longer, and word order may be changed to
improve readability. When we rephrase sentences,
the dependency structure does not change in many
cases, even if word order changes. For example,
the two sentences in Figure 2 share the same depen-
dence structure shown in Figure 3. Therefore, we
transform a sentence into its dependency structure.
This allows us to consider a sentence as a set of de-
pendency tree paths from a leaf to the root node of
the tree.
For instance, the two sentences in Figure 2 can be
transformed into the following DTPs.
a18a20a19a22a21a24a23a26a25a28a27 (I took)
watashi ga todoke ta
a18a20a29a20a30a32a31a34a33a36a35a38a37a39a23a26a25a40a27 (took to the neighbor-
hood police)
kinjo no keisatsu ni todoke ta
a18a42a41a34a43a45a44a47a46a34a23a48a25a28a27 (took the lost article)
otoshimono wo todoke ta.
3.2 An Alignment Algorithm using DTPs
In this section, we describe a method that aligns
source sentences with the summary sentences in an
abstract.
Our algorithm is very simple. We take the corre-
sponding sentence to be the one whose DTP is most
similar to that of the summary sentence. The algo-
rithm consists of the following steps:
Step 0 Transform all source sentences into DTPs.
Step 1 For each sentence “a49 ” in the abstract, apply
Step 2 and Step 3.
Step 2 Transform “a49 ” into a DTP set. Here,a50a52a51a53a49a55a54
denotesa49 ’s DTP set. a50a52a51a53a56a58a57a59a54 denotes the DTP
set of thea60-th source sentences.
a61a63a62a65a64a58a66a63a67
φ
λ0
λ0
λ0
λ0
λ0
λ0
a68a70a69a72a71
λ0
λ1
λ0
λ1
λ1
λ1
λ1
λ0
λ0
λ0
λ1
a68a72a69a74a73
λ0
λ0
λ0
λ0
λ0
λ0
a68a70a69a72a75
a61a76a62a65a64a58a66a59a77 a61a63a62a65a64a78a66a65a79
Term
a80
a66a65a81a55a82a83a61a63a84a86a85a88a87
a87a89a82a83a84a86a66a65a90a53a62a59a91a89a92a94a93a96a95
a97
a95
a97a86a98 a97a86a99
a93
a99
a93a96a100
0
DTP
t1
t2
t3
m1
m2
m3
t1 - t2
t1 - * - t3
t2 - t3
m1 - * - m3
m1 - * - m4
t1 - * - m3
t1 - * - m4
t2 - m3
t2 - m4
m1 - t2
m1 - * - t3
t1 - t2 - t3
t1 - t2 - m3
t1 - t2 - m4
m1 - t2 - t3
m1 - t2 - m3
m1 - t2 - m4
Component(=Subsequence) Value Component(=Subsequence) Value Component
(=Subsequence) Value
Figure 4: ESK with node sequence.
Step 3 For eacha101a103a102a51a105a104a106a50a52a51a53a49a55a54a105a54, we align an optimal
source sentence as follows:
We define sima51a101a107a102a103a108a56a83a57a59a54 defa109 max sima51a101a107a102a110a108a74a101 a54.
Here,a101 a104a42a50a52a51a105a56a58a57a53a54,
where, fora101a110a102 , we align a source sentence that
satisfiesa111a113a112a105a114a12a115a116a111a113a117a119a118a105a120a122a121a63a123a125a124a105a126a113a127a125a128a130a129a132a131a134a133a86a115 a51a101a102a108a56a57a54.
The above procedure allows us to derive many-
to-many correspondences.
3.3 Similarity Metrics
We need a similarity metric to rank DTP similar-
ity. The following cosine measure (Hearst, 1997) is
used in many NLP tasks.
simcosa51a105a135a137a136a108a135a78a138a139a54 a109 a140a134a141 a140a143a142a144a134a145a141 a140a143a142a144a76a146
a140a141
a138
a140a65a142a144a74a145 a140a141
a138
a140a143a142a144a76a146
(1)
Here,
a141
a140a65a142a144a74a145a108
a141
a140a143a142a144a76a146a108 denote the weight of terma147 in
textsa135a137a136a108a135a78a138 , respectively. Note that syntactic and se-
mantic information is lost in the BOW representa-
tion.
In order to solve this problem, we use similarity
measures based on word co-occurrences. As an ex-
ample of it s application, N-gram co-occurrence is
used for evaluating machine translations (Papineni
et al., 2002). String Subsequence Kernel (SSK)
(Lodhi et al., 2002) and Word Sequence Kernel
(WSK) (Cancedda et al., 2003) are extensions of n-
gram-based measures used for text categorization.
In this paper, we compare WSK to its extension,
the Extended String Subsequence Kernel (ESK).
First, we describe WSK. WSK receives two se-
quences of words as input and maps each of them
into a high-dimensional vector space. WSK’s value
is just the inner product of the two vectors.
Table 1: Components of vectors corresponding to
‘abaca’ and ‘abbab.’ Bold letters indicate common
subsequences.
subsequence abaca abbab
abb 0 1+2a148
a146
aba 1 +a148
a146 a149
a148
abc a148 0
aab 0 a148a146
aac a148 0
aaa a148
a146 0
aca a148
a146+1 0
ab 1 2+a148 +a148a151a150
aa 2a148 +a148a150 a148a146
ac 1+a148a146 0
ba 1+a148a146 1+a148
bb a152 1+a148 +a148a146
bc a148 0
ca 1 0
a 3 2
b 1 3
c 1 0
For instance, the WSK value for ‘abaca’ and
‘abbab’ is determined as follows. A subsequence
whose length is three or less is shown in Table 1.
Here, a153 is a decay parameter for the number of
skipped words. For example, subsequence ‘aba’ ap-
pears in ‘abaca’ once without skips. In addition, it
appears again with two skips, i.e., ‘ab**a.’ There-
fore, abaca’s vector has “1+a153a138 ” in the component
corresponding to ‘aba.’ From Table 1, we can cal-
culate the WSK value as follows:
a154
wska155a65a156a88a157a139a158a105a157a151a159a105a157a76a160a162a161a113a156a163a157a139a158a53a158a105a157a139a158a65a160a165a164a167a166
a168a105a169a89a168
a155a65a170a16a171a173a172a151a174a134a164a55a175a10a176a177a172
a171
a168a105a169
a170a178a175a179a155a162a176a96a171a180a172a181a171a182a172a134a183a137a164
Table 2: Description of TSC data
single multiple
# of doc clusters a152 30
# of docs 30 224
# of sentences 881 2425
# of characters 34112 111472
a171
a168a105a168
a155a86a176a177a172a10a171a180a172a183a164a184a175a179a172a174
a171
a169a89a168
a155a65a170a6a171a185a172a174a164a184a175a186a155a65a170a16a171a185a172a119a164
a171
a168
a187
a175a188a176a189a171
a169
a170a178a175
a187a191a190 (2)
In this way, we can measure the similarity be-
tween two texts. However, WSK disregards syn-
onyms, hyponyms, and hypernyms. Therefore, we
introduce ESK, an extension of WSK and a simplifi-
cation of HDAG Kernel (Suzuki et al., 2003). ESK
allows us to add word senses to each word. Here,
we do not try to disambiguate word senses, but use
all possible senses listed in a dictionary. Figure 4
shows an example of subsequences and their values.
The use of word sense yields flexible matching even
when paraphrasing is used for summary sentences.
Formally, ESK is defined as follows.
a192
eska193a195a194a6a196a83a197a96a198a184a199
a200
a201a203a202a203a204a205a89a206a125a207a63a208 a209a74a210a59a207a63a211
a192
a201
a193a213a212a65a214a143a196a143a215a217a216a113a198
(3)
a192
a201
a193a195a212a59a214a143a196a130a215a12a216a113a198a184a199
a218a132a219a139a220
a193a195a212a59a214a59a196a143a215a217a216a113a198
if a221
a199a223a222
a192a225a224
a201a227a226a12a204
a193a213a212a214a196a130a215a216a198a132a228
a218a132a219a139a220
a193a195a212a214a196a143a215a216a198
(4)
Here, a229a186a230a231a10a232a234a233a236a235a53a237a105a238a110a239a113a240 is defined as follows. a233a59a235 anda238a107a239
are nodes ofa241 and a242 , respectively. The function
a243a245a244a72a246
a232a105a247a134a237a53a233a63a240 returns the number of attributes common
to given nodesa247 anda233 .
a248a250a249
a251a55a252a213a253a59a254a125a255a1a0a3a2a5a4a7a6
a8 if
a9
a6a11a10
a12a248 a249
a251 a252a213a253a59a254a125a255a1a0a13a2a15a14a17a16a15a4a19a18
a248 a249a249
a251a1a252a213a253a59a254a125a255a1a0a13a2a15a14a17a16a15a4
(5)
a229 a230a230a231a225a232a234a233a236a235a59a237a53a238a107a239a76a240 is defined as follows:
a192 a224a224
a201
a193a195a212a214a196a130a215a216a198a110a199
a20 if
a21
a199 a222
a22 a192a225a224a224
a201
a193a195a212a59a214
a226a12a204
a196a143a215a12a216a78a198a7a23
a192
a201
a193a213a212a65a214
a226a12a204
a196a130a215a12a216a113a198
a190
(6)
Table 3: The distribution of aligned original sen-
tences corresponding to one summary sentence.
# of org. sents. 1 2 3a24
A1 Short 167 / (0.770) 49 / (0.226) 1 / (0.005)Long 283 / (0.773) 73 / (0.199) 10 / (0.027)
A2 Short 157 / (0.762) 46 / (0.223) 3 / (0.015)Long 299 / (0.817) 59 / (0.161) 11 / (0.022)
A3 Short 198 / (0.846) 34 / (0.145) 2 / (0.009)Long 359 / (0.890) 39 / (0.097) 5 / (0.012)
B1 Short 295 / (0.833) 45 / (0.127) 14 / (0.040)Long 530 / (0.869) 65 / (0.107) 15 / (0.025)
B2 Short 156 / (0.667) 58 / (0.248) 20 / (0.085)Long 312 / (0.698) 104 / (0.233) 31 / (0.069)
B3 Short 191 / (0.705) 62 / (0.229) 18 / (0.066)Long 392 / (0.797) 76 / (0.154) 24 / (0.048)
Table 4: The distribution of aligned summary sen-
tences corresponding to one original sentence.
# of sum. sents. 1 2 3 a24
A1 Short 268 / (1.000) 0 0Long 458 / (0.994) 2 / (0.006) 0
A2 Short 258 / (1.000) 0 0Long 440 / (1.000) 0 0
A3 Short 272 / (1.000) 0 0Long 450 / (1.000) 0 0
B1 Short 406 / (0.974) 11 / (0.026) 0Long 660 / (0.964) 22 / (0.032) 2 / (0.004)
B2 Short 317 / (0.975) 8 / (0.025) 0Long 550 / (0.945) 31 / (0.053) 1 / (0.002)
B3 Short 364 / (0.989) 4 / (0.011) 0Long 583 / (0.965) 16 / (0.025) 5 / (0.010)
Finally, we define the similarity measure by nor-
malizing ESK. This similarity can be regarded as an
extension of the cosine measure.
simeska252a26a25a203a255a28a27a29a4a30a6
a248
eska252a26a25a203a255a15a27a29a4
a248
eska252a26a25a203a255a31a25a32a4
a248
eska252a33a27a103a255a34a27a29a4a36a35
(7)
4 Evaluation Settings
4.1 Corpus
We used the TSC2 corpus which includes both sin-
gle and multiple document summarization data. Ta-
ble 2 shows its statistics. For each data set, each
of three experts made short abstracts and long ab-
stracts.
For each data, summary sentences were aligned
with source sentences. Table 3 shows the distribu-
tion of the numbers of aligned original sentences
for each summary sentence. The values in brack-
ets are percentages. Table 4 shows the distribution
of the number of aligned summary sentences for
each original sentence. These tables show that sen-
tences are often split and reconstructed. In partic-
ular, multiple document summarization data exhibit
Table 5: Evaluation results w/o DTP (single documents).
ESK WSK BOW 2-gram 3-gram TREE
A1 Short 0.951 0.958 0.906 0.952 0.948 0.386Long 0.951 0.959 0.916 0.961 0.959 0.418
A2 Short 0.938 0.954 0.916 0.945 0.950 0.322Long 0.968 0.973 0.940 0.966 0.972 0.476
A3 Short 0.927 0.951 0.875 0.926 0.926 0.436Long 0.967 0.966 0.926 0.961 0.962 0.547
Table 6: Evaluation results with DTP (single documents).
DTP(ESK) DTP(WSK) DTP(BOW) DTP(2-gram) DTP(3-gram)
A1 Short 0.966 (2,1.00) 0.957 (2,0.10) 0.955 0.952 0.952Long 0.960 (4,0.20) 0.957 (2,0.20) 0.960 0.951 0.949
A2 Short 0.973 (3,0.60) 0.957 (2,0.10) 0.959 0.957 0.956Long 0.977 (4,0.20) 0.974 (2,0.95) 0.972 0.973 0.975
A3 Short 0.962 (3,0.70) 0.962 (3,0.50) 0.964 0.962 0.960Long 0.967 (3,0.70) 0.969 (2,0.20) 0.962 0.960 0.960
Table 7: Effectiveness of DTP (single documents).
ESK WSK BOW 2-gram 3-gram
A1 Short a37a39a38a5a40a41 a42a17a43a36a40a38 a37a17a44a36a40a45 a46a17a43 a37a3a43a47a40a48Long
a37a17a43a36a40a45 a42a17a43a36a40a49 a37a17a44a36a40a44 a43a47a40a38 a42a50a38a51a40a43
A2 Short a37a17a52a36a40a41 a37a17a43a36a40a52 a37a17a44a36a40a52 a37a39a38a5a40a49 a37a3a43a47a40a53Long
a37a17a43a36a40a45 a37a17a43a36a40a38 a37a17a52a36a40a49 a37a17a43a36a40a54 a37a3a43a47a40a52
A3 Short a37a17a52a36a40a41 a37a39a38a5a40a38 a37a17a48a36a40a45 a37a17a52a36a40a53 a37a3a52a47a40a44Long
a46a17a43 a37a17a43a36a40a52 a37a17a52a36a40a53 a42a17a43a36a40a38 a42a3a43a47a40a49
very complex correspondence because various sum-
marization techniques such as sentence compaction,
sentence combination, and sentence integration are
used.
4.2 Comparison of Alignment Methods
We compared the proposed methods with a baseline
algorithm using various similarity measures.
Baseline
This is a simple algorithm that compares sentences
to sentences. Each summary sentence is compared
with all source sentences and the top a55 sentences
that have a similarity score over a certain threshold
valuea56 are aligned.
DTP-based Method
This method was described in Section 3.2. In order
to obtain DTPs, we used the Japanese morpholog-
ical analyzer ChaSen and the dependency structure
analyzer CaboCha (Kudo and Matsumoto, 2002).
4.2.1 Similarity Measures
We utilized the following similarity measures.
BOW BOW is defined by equation (1). Here, we
use only nouns and verbs.
N-gram This is a simple extension of BOW. We
add n-gram sequences to BOW. We exam-
ined “2-gram” (unigram + bigram) and “3-
gram,”(unigram + bigram + trigram).
TREE The Tree Kernel (Collins and Duffy, 2001)
is a similarity measure based on the number of
common subtrees. We regard a sentence as a
dependency structure tree.
WSK We examined a57a59a58a61a60 , a62 , and a63 , and a64a65a58
a66a19a67a68a66a19a69
a237
a66a19a67a68a70
a237
a66a71a67a72a70a73a69a75a74a76a74a76a74
a237
a70 .
ESK We used the Japanese lexicon Goi-Taikei
(Ikehara et al., 1997), to obtain word senses.
The parameters, a57 and a64 , were changed on the
same Conditions as above.
4.3 Evaluation Metric
Each system’s alignment output was scored by the
average F-measure. For each summary sentence,
the following F-measure was calculated.
F-measure
a199
a222a32a23a78a77
a79
a193
a204
Precision a23a80a77
a79 a204
Recalla198
(8)
Here, Precision a58a82a81a84a83 a244 and Recall a58a82a81a85a83a73a86 , where
a244 is the number of source sentences aligned by a
system for the summary sentence. a81 is the number
of correct source sentences in the output. a86 is the
number of source sentences aligned by the human
expert. We set a87 to 1, so this F-measure was aver-
aged over all summary sentences.
5 Results and Discussion
5.1 Single Document Summarization Data
Table 5 shows the results of the baseline method
(i.e., without DTPs) with the best a56 ; Table 6 shows
Table 8: Evaluation results w/o DTP (multiple documents).
ESK WSK BOW 2-gram 3-gram TREE
B1 Short 0.609 0.547 0.576 0.644 0.638 0.127Long 0.674 0.627 0.655 0.714 0.711 0.223
B2 Short 0.622 0.660 0.590 0.668 0.680 0.161Long 0.742 0.769 0.690 0.751 0.761 0.236
B3 Short 0.683 0.712 0.654 0.733 0.729 0.158Long 0.793 0.821 0.768 0.805 0.817 0.280
Table 9: Evaluation results with DTP (multiple documents).
DTP(ESK) DTP(WSK) DTP(BOW) DTP(2-gram) DTP(3-gram)
B1 Short 0.746 (2,0.85) 0.734 (2,0.55) 0.719 0.725 0.728Long 0.802 (3,0.85) 0.797 (2,0.65) 0.784 0.797 0.797
B2 Short 0.726 (2,0.65) 0.741 (3,0.25) 0.710 0.720 0.721Long 0.808 (2,0.55) 0.800 (3,0.05) 0.797 0.797 0.794
B3 Short 0.790 (2,0.55) 0.786 (3,0.05) 0.748 0.768 0.760Long 0.845 (3,0.60) 0.861 (2,0.40) 0.828 0.835 0.830
Table 10: Effectiveness of DTP (multiple docu-
ments).
ESK WSK BOW 2-gram 3-gram
B1 Short a37a39a38a31a52a47a40a54 a37a39a38a31a48a47a40a54 a37a39a38a31a44a47a40a52 a37a17a48a47a40a38 a37a17a45a36a40a53Long
a37a39a38a31a49a47a40a48 a37a39a38a28a54a51a40a43 a37a39a38a31a49a47a40a45 a37a17a48a47a40a52 a37a17a48a36a40a53
B2 Short a37a39a38a31a43a47a40a44 a37a17a48a47a40a38 a37a39a38a31a49a47a40a43 a37a17a41a47a40a49 a37a17a44a36a40a38Long
a37a17a41a47a40a53 a37a17a52a47a40a38 a37a39a38a31a43a47a40a54 a37a17a44a47a40a53 a37a17a52a36a40a52
B3 Short a37a39a38a31a43a47a40a54 a37a39a54a51a40a44 a37a17a45a47a40a44 a37a17a52a47a40a41 a37a17a52a36a40a38Long
a37a17a41a47a40a49 a37a17a44a47a40a43 a37a17a53a47a40a43 a37a17a52a47a40a43 a37a39a38a5a40a52
the results of using DTPs with the best a57 and a64 ,
which are shown in brackets. From the results, we
can see the effectiveness of DTPs because Table
6 shows better performance than Table 5 in most
cases. Table 7 shows the difference between Tables
5 and 6. DTPs improved the results of BOW by
about five points. The best result is DTP with ESK.
However, we have to admit that the improvements
are relatively small for single document data. On
the other hand Tree Kernel did not work well since
it is too sensitive to slight differences. This is known
as a weak point of Tree Kernel (Suzuki et al., 2003).
According to the tables, BOW is outperformed by
the other methods except Tree Kernel. These results
show that word co-occurrence is important. More-
over, we see that sequential patterns are better than
consequential patterns, such as the N-gram.
Without DPTs, ESK is worse than WSK. How-
ever, ESK becomes better than WSK when we use
DTPs. This result implies that word senses are dis-
ambiguated by syntactic information, but more ex-
amination is needed.
5.2 Multiple Document Summarization Data
Table 8 shows the results of the baseline method
with the besta56 for multiple document data while Ta-
ble 9 shows the result of using DTPs with the best a57
and a64 , (in brackets). Compared with the single doc-
ument summarization results, the F-measures are
low. This means that the sentence alignment task
is more difficult in multiple document summariza-
tion than in single document summarization. This
is because sentence compaction, combination, and
integration are common.
Although the results show the same tendency as
the single document summarization case, more im-
provements are noticed. Table 10 shows the differ-
ence between Tables 8 and 9. We see improvements
in 10 points in ESK, WSK, and BOW. In multiple
document summarization, sentences are often reor-
ganized. Therefore, it is more effective to decom-
pose a sentence into DTP sets and to compute simi-
larity between the DTPs.
Moreover, DTP(ESK) is once again superior to
DTP(WSK).
5.3 Parameter Tuning
For ESK and WSK, we have to choose parameters,
a57 and a64 . However, we do not know an easy way
of finding the best combination of a57 and a64 . There-
fore, we tuned these parameters for a development
set. The experimental results show that the best a57 is
2 or 3. However, we could not find a consistently
optimal value of a64 . Figure 5 shows the F-measure
with various a64 for a57a88a58a89a60 . The results shows that the
F-measure does not change very much in the middle
range a64 , [0.4,0.6] which suggests that good results
are possible by using a middle range a64 .
0.7
0.72
0.74
0.76
0.78
0.8
0.82
0.84
0.86
0.05 0.15 0.25 0.35 0.45 0.55 0.65 0.75 0.85 0.95
B1(Short)
B2(Short)
B3(Short)
B2(Long)
B1(Long)
B3(Long)
λ
F-measure
1.00
Figure 5: F-measures with various a64 values(a57a88a58a89a60 ).
6 Conclusion
This paper introduced an automatic sentence align-
ment method that integrates syntax and semantic in-
formation. Our method transforms a sentence into
a DTP set and calculates the similarity between the
DTPs by using ESK. Experiments on the TSC (Text
Summarization Challenge) corpus, which has com-
plex correspondence, showed that the introduction
of DTP consistently improves alignment accuracy
and that ESK gave the best results.

References

M. Banko, V. Mittal, M. Kantrowitz, and J. Gold-
stein. 1999. Generating Extraction-Based Sum-
maries from Hand-Written Summaries by Align-
ing Text Spans. Proc. of the 4th Conference of
the Pacific Association for Computational Lin-
guistics.

R. Barzilay and N. Elhadad. 2003. Sentence
Alignment for Monolingual Comparable Cor-
pora. Proc. of the Empirical Methods for Natural
Language Processing 2003, pages 25–32.

N. Cancedda, E. Gaussier, C. Goutte, and J-M. Ren-
ders. 2003. Word-Sequence Kernels. Journal of
Machine Learning Research, 3(Feb):1059–1082.

M. Collins and N. Duffy. 2001. Convolution Ker-
nels for Natural Language. In Proc. of Neural In-
formation Processing Systems (NIPS’2001).

V. Hatzivassiloglou, J.L. Klavans, and E. Eskin.
1999. Detecting Text Similarity over Short Pas-
sages: Exploring Linguistic Feature Combina-
tions via Machine Learning. Proc. of the Empir-
ical Methods for Natural Language Processing
1999, pages 203–212.

V. Hatzivassiloglou, J.L. Klavans, M.L. Holcombe,
R. Barzilay, M-Y. Kan, and K. R. McKeown.
2001. SimFinder: A Flexible Clustering Tool for
Summarization. Proc. of the Workshop on Auto-
matic Summarization 2001, pages 41–49.

M-A. Hearst. 1997. TextTiling: Segmenting Text
into Multi-paragraph Subtopic Passages. Compu-
tational Linguistics, 23(1):33–64.

S. Ikehara, M. Miyazaki, S. Shirai, A. Yokoo,
H. Nakaiwa, K. Ogura, Y. Ooyama, and
Y. Hayashi. 1997. Goi-Taikei – A Japanese Lex-
icon (in Japanese). Iwanami Shoten.

H. Jing and K. McKeown. 1999. The Decom-
position of Human-Written Summary Sentences.
Proc. of the 22nd Annual International ACM-
SIGIR Conference on Research and Development
in Information Retrieval, pages 129–136.

T. Kudo and Y. Matsumoto. 2002. Japanese De-
pendency Analysis using Cascaded Chunking.
Proc. of the 6th Conference on Natural Language
Learning, pages 63–69.

H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cris-
tianini, and C. Watkins. 2002. Text Classifica-
tion using String Kernel. Journal of Machine
Learning Research, 2(Feb):419–444.

D. Marcu. 1999. The Automatic Construction
of Large-scale Corpora for Summarization Re-
search. Proc. of the 22nd Annual International
ACM-SIGIR Conference on Research and Devel-
opment in Information Retrieval, pages 137–144.

M. Okumura, T. Fukusima, and H. Nanba. 2003.
Text Summarization Challenge 2 - Text Sum-
marization Evaluation at NTCIR Workshop 3.
HLT-NAACL 2003 Workshop: Text Summariza-
tion (DUC03), pages 49–56.

S. Papineni, S. Roukos, T. Ward, and W-J Zhu.
2002. Bleu: a Method for Automatic Evalua-
tion of Machine Translation. Proc. of the 40th
Annual Meeting of the Association for Computa-
tional Linguistics, pages 62–66.

J. Suzuki, T. Hirao, Y. Sasaki, and E. Maeda.
2003. Hierarchical Directed Acyclic Graph Ker-
nel: Methods for Structured Natural Language
Data. Proc. of the 41st Annual Meeting of the
Association for Computational Linguistics, pages
32–39.
