Probabilistic Sentence Reduction Using Support Vector Machines
Minh Le Nguyen, Akira Shimazu, Susumu Horiguchi
Bao Tu Ho and Masaru Fukushi
Japan Advanced Institute of Science and Technology
1-8, Tatsunokuchi, Ishikawa, 923-1211, JAPAN
fnguyenml, shimazu, hori, bao, mfukushig@jaist.ac.jp
Abstract
This paper investigates a novel application of sup-
port vector machines (SVMs) for sentence reduction.
We also propose a new probabilistic sentence reduc-
tion method based on support vector machine learn-
ing. Experimental results show that the proposed
methods outperform earlier methods in term of sen-
tence reduction performance.
1 Introduction
The most popular methods of sentence reduc-
tion for text summarization are corpus based
methods. Jing (Jing 00) developed a method
to remove extraneous phrases from sentences
by using multiple sources of knowledge to de-
cide which phrases could be removed. However,
while this method exploits a simple model for
sentence reduction by using statistics computed
from a corpus, a better model can be obtained
by using a learning approach.
Knight and Marcu (Knight and Marcu 02)
proposed a corpus based sentence reduction
method using machine learning techniques.
They discussed a noisy-channel based approach
and a decision tree based approach to sentence
reduction. Their algorithms provide the best
way to scale up the full problem of sentence re-
duction using available data. However, these al-
gorithms require that the word order of a given
sentence and its reduced sentence are the same.
Nguyen and Horiguchi (Nguyen and Horiguchi
03) presented a new sentence reduction tech-
nique based on a decision tree model without
that constraint. They also indicated that se-
mantic information is useful for sentence reduc-
tion tasks.
The major drawback of previous works on
sentence reduction is that those methods are
likely to output local optimal results, which may
have lower accuracy. This problem is caused by
the inherent sentence reduction model; that is,
only a single reduced sentence can be obtained.
As pointed out by Lin (Lin 03), the best sen-
tence reduction output for a single sentence is
not approximately best for text summarization.
This means that \local optimal" refers to the
best reduced output for a single sentence, while
the best reduced output for the whole text is
\global optimal". Thus, it would be very valu-
able if the sentence reduction task could gener-
ate multiple reduced outputs and select the best
one using the whole text document. However,
such a sentence reduction method has not yet
been proposed.
Support Vector Machines (Vapnik 95), on the
other hand, are strong learning methods in com-
parison with decision tree learning and other
learning methods (Sholkopf 97). The goal of
this paper is to illustrate the potential of SVMs
for enhancing the accuracy of sentence reduc-
tion in comparison with previous work. Accord-
ingly, we describe a novel deterministic method
for sentence reduction using SVMs and a two-
stage method using pairwise coupling (Hastie
98). To solve the problem of generating mul-
tiple best outputs, we propose a probabilistic
sentence reduction model, in which a variant of
probabilistic SVMs using a two-stage method
with pairwise coupling is discussed.
The rest of this paper will be organized as
follows: Section 2 introduces the Support Vec-
tor Machines learning. Section 3 describes the
previous work on sentence reduction and our
deterministic sentence reduction using SVMs.
We also discuss remaining problems of deter-
ministic sentence reduction. Section 4 presents
a probabilistic sentence reduction method using
support vector machines to solve this problem.
Section 5 discusses implementation and our ex-
perimental results; Section 6 gives our conclu-
sions and describes some problems that remain
to be solved in the future.
2 Support Vector Machine
Support vector machine (SVM)(Vapnik 95) is a
technique of machine learning based on statisti-
cal learning theory. Suppose that we are given
l training examples (xi;yi), (1 • i • l), where
xi is a feature vector in n dimensional feature
space, yi is the class label f-1, +1 g of xi. SVM
flnds a hyperplane w:x + b = 0 which correctly
separates the training examples and has a max-
imum margin which is the distance between two
hyperplanes w:x+b ‚ 1 and w:x+b •¡1: The
optimal hyperplane with maximum margin can
be obtained by solving the following quadratic
programming.
min 12 kwk+C0
lP
i
»i
s:t: yi(w:xi +b) ‚ 1¡»i
»i ‚ 0
(1)
where C0 is the constant and »i is a slack vari-
able for the non-separable case. In SVM, the
optimal hyperplane is formulated as follows:
f(x) = sign
ˆ lX
1
fiiyiK(xi;x)+b
!
(2)
where fii is the Lagrange multiple, and
K(x0;x00) is a kernel function, the SVM calcu-
lates similarity between two arguments x0 and
x00. For instance, the Polynomial kernel func-
tion is formulated as follow:
K(x0;x00) = (x0:x00)p (3)
SVMs estimate the label of an unknown ex-
ample x whether the sign of f(x) is positive or
not.
3 Deterministic Sentence Reduction
Using SVMs
3.1 Problem Description
In the corpus-based decision tree approach, a
given input sentence is parsed into a syntax tree
and the syntax tree is then transformed into a
small tree to obtain a reduced sentence.
Let t and s be syntax trees of the original sen-
tence and a reduced sentence, respectively. The
process of transforming syntax tree t to small
tree s is called \rewriting process" (Knight and
Marcu 02), (Nguyen and Horiguchi 03). To
transform the syntax tree t to the syntax tree
s; some terms and flve rewriting actions are de-
flned.
An Input list consists of a sequence of words
subsumed by the tree t where each word in the
Input list is labelled with the name of all syntac-
tic constituents in t. Let CSTACK be a stack
that consists of sub trees in order to rewrite a
small tree. Let RSTACK be a stack that con-
sists of sub trees which are removed from the
Input list in the rewriting process.
† SHIFT action transfers the flrst word from the
Input list into CSTACK. It is written mathe-
matically and given the label SHIFT.
† REDUCE(lk;X) action pops the lk syntactic
trees located at the top of CSTACK and com-
bines them in a new tree, where lk is an integer
and X is a grammar symbol.
† DROP X action moves subsequences of words
that correspond to syntactic constituents from
the Input list to RSTACK.
† ASSIGN TYPE X action changes the label of
trees at the top of the CSTACK. These POS
tags might be difierent from the POS tags in
the original sentence.
† RESTORE X action takes the X element in
RSTACK and moves it into the Input list,
where X is a subtree.
For convenience, let conflguration be a status
of Input list, CSTACK and RSTACK. Let cur-
rent context be the important information in a
conflguration. The important information are
deflned as a vector of features using heuristic
methods as in (Knight and Marcu 02), (Nguyen
and Horiguchi 03).
The main idea behind deterministic sentence
reduction is that it uses a rule in the current
context of the initial conflguration to select a
distinct action in order to rewrite an input sen-
tence into a reduced sentence. After that, the
current context is changed to a new context and
the rewriting process is repeated for selecting
an action that corresponds to the new context.
The rewriting process is flnished when it meets
a termination condition. Here, one rule corre-
sponds to the function that maps the current
context to a rewriting action. These rules are
learned automatically from the corpus of long
sentences and their reduced sentences (Knight
and Marcu 02), (Nguyen and Horiguchi 03).
3.2 Example
Figure 1 shows an example of applying a se-
quence of actions to rewrite the input sentence
(a;b;c;d;e), when each character is a word. It
illustrates the structure of the Input list, two
stacks, andthetermofarewritingprocessbased
on the actions mentioned above. For example,
in the flrst row, DROP H deletes the sub-tree
with its root node H in the Input list and stores
it in the RSTACK. The reduced tree s can be
obtained after applying a sequence of actions
as follows: DROP H; SHIFT; ASSIGN TYPE K;
DROP B; SHIFT; ASSIGN TYPE H; REDUCE 2
F; RESTORE H; SHIFT; ASSIGN TYPE D; RE-
DUCE 2G. In this example, the reduced sentence
is (b;e;a).
Figure 1: An Example of the Rewriting Process
3.3 Learning Reduction Rules Using
SVMs
As mentioned above, the action for each conflg-
uration can be decided by using a learning rule,
which maps a context to an action. To obtain
such rules, the conflguration is represented by
a vector of features with a high dimension. Af-
ter that, we estimate the training examples by
using several support vector machines to deal
with the multiple classiflcation problem in sen-
tence reduction.
3.3.1 Features
One important task in applying SVMs to text
summarization is to deflne features. Here, we
describe features used in our sentence reduction
models.
The features are extracted based on the cur-
rent context. As it can be seen in Figure 2, a
context includes the status of the Input list and
the status of CSTACK and RSTACK. We de-
flne a set of features for a current context as
described bellow.
Operation feature
The set of features as described in (Nguyen and
Horiguchi 03) are used in our sentence reduction
models.
Original tree features
These features denote the syntactic constituents
Figure 2: Example of Conflguration
that start with the flrst unit in the Input list.
For example, in Figure 2 the syntactic con-
stituents are labels of the current element in the
Input list from \VP" to the verb \convince".
Semantic features
The following features are used in our model as
semantic information.
† Semantic information about current words
within the Input list; these semantic types
are obtained by using the named entities such
as Location, Person, Organization and Time
within the input sentence. To deflne these
name entities, we use the method described in
(Borthwick 99).
† Semantic information about whether or not the
word in the Input list is a head word.
† Word relations, such as whether or not a word
has a relationship with other words in the sub-
categorization table. These relations and the
sub-categorization table are obtained using the
Commlex database (Macleod 95).
Using the semantic information, we are able to
avoid deleting important segments within the
given input sentence. For instance, the main
verb, the subject and the object are essential
and for the noun phrase, the head noun is essen-
tial, but an adjective modifler of the head noun
is not. For example, let us consider that the
verb \convince" was extracted from the Com-
lex database as follows.
convince
NP-PP: PVAL (\of")
NP-TO-INF-OC
This entry indicates that the verb \convince"
can be followed by a noun phrase and a preposi-
tional phrase starting with the preposition \of".
It can be also followed by a noun phrase and a
to-inflnite phrase. This information shows that
we cannot delete an \of" prepositional phrase
or a to-inflnitive that is the part of the verb
phrase.
3.3.2 Two-stage SVM Learning using
Pairwise Coupling
Using these features we can extract training
data for SVMs. Here, a sample in our training
data consists of pairs of a feature vector and
an action. The algorithm to extract training
data from the training corpus is modifled using
the algorithm described in our pervious work
(Nguyen and Horiguchi 03).
Since the original support vector machine
(SVM) is a binary classiflcation method, while
the sentence reduction problem is formulated as
multiple classiflcation, we have to flnd a method
to adapt support vector machines to this prob-
lem. For multi-class SVMs, one can use strate-
gies such as one-vs all, pairwise comparison or
DAG graph (Hsu 02). In this paper, we use the
pairwise strategy, which constructs a rule for
discriminating pairs of classes and then selects
the class with the most winning among two class
decisions.
To boost the training time and the sentence
reduction performance, we propose a two-stage
SVM described below.
Suppose that the examples in training data
are divided into flve groups m1;m2;:::;m5 ac-
cording to their actions. Let Svmc be multi-
class SVMs and let Svmc-i be multi-class SVMs
for a group mi: We use one Svmc classifler to
identify the group to which a given context e
should be belong. Assume that e belongs to
the group mi: The classifler Svmc-i is then used
to recognize a speciflc action for the context e:
The flve classiflers Svmc-1, Svmc-2,..., Svmc-5
are trained by using those examples which have
actions belonging to SHIFT, REDUCE, DROP,
ASSIGN TYPE and RESTORE.
Table 1 shows the distribution of examples in
flve data groups.
3.4 Disadvantage of Deterministic
Sentence Reductions
The idea of the deterministic algorithm is to
use the rule for each current context to select
the next action, and so on. The process termi-
nates when a stop condition is met. If the early
steps of this algorithm fail to select the best ac-
Table 1: Distribution of example data on flve
data groups
Name Number of examples
SHIFT-GROUP 13,363
REDUCE-GROUP 11,406
DROP-GROUP 4,216
ASSIGN-GROUP 13,363
RESTORE-GROUP 2,004
TOTAL 44,352
tions, then the possibility of obtaining a wrong
reduced output becomes high.
Onewaytosolvethisproblemistoselectmul-
tiple actions that correspond to the context at
each step in the rewriting process. However,
the question that emerges here is how to deter-
mine which criteria to use in selecting multiple
actions for a context. If this problem can be
solved, then multiple best reduced outputs can
be obtained for each input sentence and the best
one will be selected by using the whole text doc-
ument.
In the next section propose a model for se-
lecting multiple actions for a context in sentence
reduction as a probabilistic sentence reduction
and present a variant of probabilistic sentence
reduction.
4 Probabilistic Sentence Reduction
Using SVM
4.1 The Probabilistic SVM Models
Let A be a set of k actions A =
fa1;a2:::ai;:::;akg and C be a set of n con-
texts C = fc1;c2:::ci;:::;cng: A probabilistic
model fi for sentence reduction will select an
action a 2 A for the context c with probability
pfi(ajc). The pfi(ajc) can be used to score ac-
tion a among possible actions A depending the
context c that is available at the time of deci-
sion. There are several methods for estimating
such scores; we have called these \probabilistic
sentence reduction methods". The conditional
probability pfi(ajc) is estimated using a variant
of probabilistic support vector machine, which
is described in the following sections.
4.1.1 Probabilistic SVMs using
Pairwise Coupling
For convenience, we denote uij = p(a = aija =
ai_aj;c): Given a context c and an action a; we
assume that the estimated pairwise class prob-
abilities rij of uij are available. Here rij can
be estimated by some binary classiflers. For
instance, we could estimate rij by using the
SVM binary posterior probabilities as described
in (Plat 2000). Then, the goal is to estimate
fpigki=1 ; where pi = p(a = aijc); i = 1;2;:::;k:
For this propose, a simple estimate of these
probabilities can be derived using the following
voting method:
pi = 2
X
j:j6=i
Ifrij>rjig=k(k¡1)
where I is an indicator function and k(k¡1) is
the number of pairwise classes. However, this
model is too simple; we can obtain a better one
with the following method.
Assume that uij are pairwise probabilities of
the model subject to the condition that uij =
pi=(pi+pj):In (Hastie 98), the authors proposed
to minimize the Kullback-Leibler (KL) distance
between the rij and uij
l(p) =
X
i6=j
nijrijlogriju
ij
(4)
where rij and uij are the probabilities of a pair-
wise ai and aj in the estimated model and in
our model, respectively, and nij is the number
of training data in which their classes are ai or
aj: To flnd the minimizer of equation (6), they
flrst calculate
@l(p)
@pi =
X
i6=j
nij(¡rijp
i
+ 1p
i +pj
):
Thus, letting ¢l(p) = 0, they proposed to flnd
a point satisfying
X
j:j6=i
nijuij =
X
j:j6=i
nijrij;
kP
i=1
pi = 1;
where i = 1;2;:::k and pi > 0:
Such a point can be obtained by using an algo-
rithm described elsewhere in (Hastie 98). We
applied it to obtain a probabilistic SVM model
for sentence reduction using a simple method as
follows. Assume that our class labels belong to
l groups: M = fm1;m2:::mi;:::;mlg; where l
is a number of groups and mi is a group e.g.,
SHIFT, REDUCE ,..., ASSIGN TYPE. Then
the probability p(ajc) of an action a for a given
context c can be estimated as follows.
p(ajc) = p(mijc)£p(ajc;mi) (5)
where mi is a group and a 2 mi: Here, p(mijc)
and p(ajc;mi) are estimated by the method in
(Hastie 98).
4.2 Probabilistic sentence reduction
algorithm
After obtaining a probabilistic model p; we then
use this model to deflne function score, by which
the search procedure ranks the derivation of in-
complete and complete reduced sentences. Let
d(s) = fa1;a2;:::adg be the derivation of a small
tree s; where each action ai belongs to a set of
possible actions. The score of s is the product
of the conditional probabilities of the individual
actions in its derivation.
Score(s) =
Y
ai2d(s)
p(aijci) (6)
where ci is the context in which ai was decided.
The search heuristic tries to flnd the best re-
duced tree s⁄ as follows:
s⁄ = argmax| {z }
s2tree(t)
Score(s) (7)
where tree(t) are all the complete reduced trees
from the tree t of the given long sentence. As-
sume that for each conflguration the actions
fa1;a2;:::ang are sorted in decreasing order ac-
cording to p(aijci); in which ci is the context
of that conflguration. Algorithm 1 shows a
probabilistic sentence reduction using the top
K-BFS search algorithm. This algorithm uses
a breadth-flrst search which does not expand
the entire frontier, but instead expands at most
the top K scoring incomplete conflgurations in
the frontier; it is terminated when it flnds M
completed reduced sentences (CL is a list of re-
duced trees), or when all hypotheses have been
exhausted. A conflguration is completed if and
only if the Input list is empty and there is one
tree in the CSTACK. Note that the function
get-context(hi;j) obtains the current context of
the jth conflguration in hi; where hi is a heap at
step i: The function Insert(s,h) ensures that the
heap h is sorted according to the score of each
element in h: Essentially, in implementation we
can use a dictionary of contexts and actions ob-
served from the training data in order to reduce
the number of actions to explore for a current
context.
5 Experiments and Discussion
We used the same corpus as described in
(Knight and Marcu 02), which includes 1,067
pairs of sentences and their reductions. To
evaluate sentence reduction algorithms, we ran-
domly selected 32 pairs of sentences from our
parallel corpus, which is refered to as the test
corpus. The training corpus of 1,035 sentences
extracted 44,352 examples, in which each train-
ing example corresponds to an action. The
SVM tool, LibSVM (Chang 01) is applied to
train our model. The training examples were
Algorithm 1 A probabilistic sentence reduction
algorithm
1: CL=fEmptyg;
i = 0; h0=f Initial conflgurationg
2: while jCLj < M do
3: if hi is empty then4:
break;5:
end if6:
u =min(jhij;K)
7: for j = 1 to u do8:
c=get-context(hi;j)
9: Select m so that
mP
i=1
p(aijc) < Q is maximal
10: for l=1 to m do11:
parameter=get-parameter(al);
12: Obtain a new conflguration s by performing action al
with parameter
13: if Complete(s) then
14: Insert(s;CL)
15: else16:
Insert(s;hi+1)
17: end if18:
end for19:
end for20:
i = i+ 1
21: end while
divided into SHIFT, REDUCE, DROP, RE-
STORE, and ASSIGN groups. To train our
support vector model in each group, we used
the pairwise method with the polynomial ker-
nel function, in which the parameter p in (3)
and the constant C0 in equation (1) are 2 and
0:0001, respectively.
The algorithms (Knight and Marcu 02) and
(Nguyen and Horiguchi 03) served as the base-
line1 and the baseline2 for comparison with the
proposed algorithms. Deterministic sentence re-
duction using SVM and probabilistic sentence
reductionwerenamedasSVM-DandSVMP,re-
spectively. For convenience, the ten top reduced
outputs using SVMP were called SVMP-10. We
used the same evaluation method as described
in (Knight and Marcu 02) to compare the pro-
posed methods with previous methods. For this
experiment, we presented each original sentence
in the test corpus to three judges who are spe-
cialists in English, together with three sentence
reductions: the human generated reduction sen-
tence, the outputs of the proposed algorithms,
and the output of the baseline algorithms.
The judges were told that all outputs were
generated automatically. The order of the out-
puts was scrambled randomly across test cases.
The judges participated in two experiments. In
the flrst, they were asked to determine on a scale
from 1 to 10 how well the systems did with re-
spect to selecting the most important words in
the original sentence. In the second, they were
asked to determine the grammatical criteria of
reduced sentences.
Table 2 shows the results of English language
sentence reduction using a support vector ma-
chine compared with the baseline methods and
with human reduction. Table 2 shows compres-
sion rates, and mean and standard deviation re-
sults across all judges, for each algorithm. The
results show that the length of the reduced sen-
tences using decision trees is shorter than using
SVMs, and indicate that our new methods out-
perform the baseline algorithms in grammatical
and importance criteria. Table 2 shows that the
Table 2: Experiment results with Test Corpus
Method Comp Gramma Impo
Baseline1 57.19% 8:60§2:8 7:18§1:92
Baseline2 57.15% 8:60§2:1 7:42§1:90
SVM-D 57.65% 8:76§1:2 7:53§1:53
SVMP-10 57.51% 8:80§1:3 7:74§1:39
Human 64.00% 9:05§0:3 8:50§0:80
flrst 10 reduced sentences produced by SVMP-
10 (the SVM probabilistic model) obtained the
highest performances. We also compared the
computation time of sentence reduction using
support vector machine with that in previous
works. Table 3 shows that the computational
times for SVM-D and SVMP-10 are slower than
baseline, but it is acceptable for SVM-D.
Table 3: Computational times of performing re-
ductions on test-set. Average sentence length
was 21 words.
Method Computational times (sec)
Baseline1 138.25
SVM-D 212.46
SVMP-10 1030.25
We also investigated how sensitive the pro-
posed algorithms are with respect to the train-
ing data by carrying out the same experi-
ment on sentences of difierent genres. We
created the test corpus by selecting sentences
from the web-site of the Benton Foundation
(http://www.benton.org). The leading sen-
tences in each news article were selected as the
most relevant sentences to the summary of the
news. We obtained 32 leading long sentences
and 32 headlines for each item. The 32 sen-
tences are used as a second test for our methods.
We use a simple ranking criterion: the more the
words in the reduced sentence overlap with the
words in the headline, the more important the
sentence is. A sentence satisfying this criterion
is called a relevant candidate.
For a given sentence, we used a simple
method, namely SVMP-R to obtain a re-
duced sentence by selecting a relevant candi-
date among the ten top reduced outputs using
SVMP-10.
Table 4 depicts the experiment results for
the baseline methods, SVM-D, SVMP-R, and
SVMP-10. The results shows that, when ap-
plied to sentence of a difierent genre, the per-
formance of SVMP-10 degrades smoothly, while
the performance of the deterministic sentence
reductions (the baselines and SVM determinis-
tic) drops sharply. This indicates that the prob-
abilistic sentence reduction using support vector
machine is more stable.
Table 4 shows that the performance of
SVMP-10 is also close to the human reduction
outputs and is better than previous works. In
addition, SVMP-R outperforms the determin-
istic sentence reduction algorithms and the dif-
ferences between SVMP-R’s results and SVMP-
10 are small. This indicates that we can ob-
tain reduced sentences which are relevant to the
headline, while ensuring the grammatical and
the importance criteria compared to the origi-
nal sentences.
Table 4: Experiment results with Benton Cor-
pus
Method Comp Gramma Impo
Baseline1 54.14% 7:61§2:10 6:74§1:92
Baseline2 53.13% 7:72§1:60 7:02§1:90
SVM-D 56.64% 7:86§1:20 7:23§1:53
SVMP-R 58.31% 8:25§1:30 7:54§1:39
SVMP-10 57.62% 8:60§1:32 7:71§1:41
Human 64.00% 9:01§0:25 8:40§0:60
6 Conclusions
We have presented a new probabilistic sentence
reduction approach that enables a long sentence
to be rewritten into reduced sentences based on
support vector models. Our methods achieves
better performance when compared with earlier
methods. The proposed reduction approach can
generate multiple best outputs. Experimental
resultsshowedthatthetop10reducedsentences
returned by the reduction process might yield
accuracies higher than previous work. We be-
lieve that a good ranking method might improve
the sentence reduction performance further in a
text.

References

A. Borthwick, \A Maximum Entropy Approach
to Named Entity Recognition", Ph.D the-
sis, Computer Science Department, New York
University (1999).

C.-C. Chang and C.-J. Lin, \LIB-
SVM: a library for support vec-
tor machines", Software available at
http://www.csie.ntu.edu.tw/ cjlin/libsvm.

H. Jing, \Sentence reduction for automatic
text summarization", In Proceedings of the
First Annual Meeting of the North Ameri-
can Chapter of the Association for Compu-
tational Linguistics NAACL-2000.

T.T. Hastie and R. Tibshirani, \Classiflcation
by pairwise coupling", The Annals of Statis-
tics, 26(1): pp. 451-471, 1998.

C.-W. Hsu and C.-J. Lin, \A comparison of
methods for multi-class support vector ma-
chines", IEEE Transactions on Neural Net-
works, 13, pp. 415-425, 2002.

K. Knight and D. Marcu, \Summarization be-
yond sentence extraction: A Probabilistic ap-
proach to sentence compression", Artiflcial
Intelligence 139: pp. 91-107, 2002.

C.Y. Lin, \Improving Summarization Perfor-
mance by Sentence Compression | A Pi-
lot Study", Proceedings of the Sixth Inter-
national Workshop on Information Retrieval
with Asian Languages, pp.1-8, 2003.

C. Macleod and R. Grishman, \COMMLEX
syntax Reference Manual"; Proteus Project,
New York University (1995).

M.L. Nguyen and S. Horiguchi, \A new sentence
reduction based on Decision tree model",
Proceedings of 17th Paciflc Asia Conference
on Language, Information and Computation,
pp. 290-297, 2003

V. Vapnik, \The Natural of Statistical Learning
Theory", New York: Springer-Verlag, 1995.

J. Platt,\ Probabilistic outputs for support vec-
tor machines and comparison to regularized
likelihood methods," in Advances in Large
Margin Classiflers, Cambridege, MA: MIT
Press, 2000.

B. Scholkopf et al, \Comparing Support Vec-
tor Machines with Gausian Kernels to Radius
Basis Function Classifers", IEEE Trans. Sig-
nal Procesing, 45, pp. 2758-2765, 1997.
