A Multiclassifier based Document Categorization System: profiting from
the Singular Value Decomposition Dimensionality Reduction Technique
Ana Zelaia
UPV-EHU
Basque Country
ccpzejaa@si.ehu.es
I˜naki Alegria
UPV-EHU
Basque Country
acpalloi@si.ehu.es
Olatz Arregi
UPV-EHU
Basque Country
acparuro@si.ehu.es
Basilio Sierra
UPV-EHU
Basque Country
ccpsiarb@si.ehu.es
Abstract
In this paper we present a multiclassifier
approach for multilabel document classifi-
cationproblems, whereasetofk-NNclas-
sifiers is used to predict the category of
text documents based on different training
subsampling databases. These databases
are obtained from the original training
database by random subsampling. In or-
der to combine the predictions generated
by the multiclassifier, Bayesian voting is
applied. Throughall theclassificationpro-
cess,areduceddimensionvectorrepresen-
tation obtained by Singular Value Decom-
position (SVD) is used for training and
testingdocuments. Thegoodresultsofour
experiments give an indication of the po-
tentiality of the proposed approach.
1 Introduction
Document Categorization, the assignment of nat-
ural language texts to one or more predefined
categories based on their content, is an impor-
tant component in many information organization
and management tasks. Researchers have con-
centrated their efforts in finding the appropriate
way to represent documents, index them and con-
struct classifiers to assign the correct categories to
each document. Both, document representation
and classification method are crucial steps in the
categorization process.
In this paper we concentrate on both issues. On
the one hand, we use Latent Semantic Indexing
(LSI) (Deerwester et al., 1990), which is a vari-
ant of the vector space model (VSM) (Salton and
McGill, 1983), in order to obtain the vector rep-
resentation of documents. This technique com-
presses vectors representing documents into vec-
tors of a lower-dimensional space. LSI, which
is based on Singular Value Decomposition (SVD)
of matrices, has showed to have the ability to ex-
tract the relations among words and documents by
means of their context of use, and has been suc-
cessfully applied to Information Retrieval tasks.
On the other hand, we construct a multiclassi-
fier (Ho et al., 1994) which uses different train-
ing databases. These databases are obtained from
the original training set by random subsampling.
We implement this approach by bagging, and use
the k-NN classification algorithm to make the cat-
egory predictions for testing documents. Finally,
we combine all predictions made for a given doc-
ument by Bayesian voting.
The experiment we present has been evaluated
for Reuters-21578 standard document collection.
Reuters-21578 is a multilabel document collec-
tion, which means that categories are not mutu-
ally exclusive because the same document may be
relevant to more than one category. Being aware
of the results published in the most recent litera-
ture, and having obtained good results in our ex-
periments, we consider the categorization method
presented in this paper an interesting contribution
for text categorization tasks.
The remainder of this paper is organized as fol-
lows: Section 2, discusses related work on docu-
ment categorization for Reuters-21578 collection.
In Section 3, we present our approach to deal with
the multilabel text categorization task. In Section
4 the experimental setup is introduced, and details
about the Reuters database, the preprocessing ap-
plied and some parameter setting are provided. In
Section 5, experimental results are presented and
discussed. Finally, Section 6 contains some con-
clusions and comments on future work.
25
2 Related Work
As previously mentioned in the introduction, text
categorization consists in assigning predefined
categories to text documents. In the past two
decades, document categorization has received
much attention and a considerable number of ma-
chine learning based approaches have been pro-
posed. A good tutorial on the state-of-the-art of
document categorization techniques can be found
in (Sebastiani, 2002).
In the document categorization task we can find
two cases; (1) the multilabel case, which means
that categories arenot mutuallyexclusive, because
the same document may be relevant to more than
one category (1 to m category labels may be as-
signed to the same document, being m the to-
tal number of predefined categories), and (2) the
single-label case, where exactly one category is
assigned to each document. While most machine
learning systems are designated to handle multi-
class data1, much less common are systems that
can handle multilabel data.
For experimentation purposes, there are stan-
dard document collections available in the pub-
lic domain that can be used for document catego-
rization. The most widely used is Reuters-21578
collection, which is a multiclass (135 categories)
and multilabel (the mean number of categories as-
signed to a document is 1.2) dataset. Many ex-
periments have been carried out for the Reuters
collection. However, they have been performed in
different experimental conditions. This makes re-
sults difficult to compare among them. In fact, ef-
fectiveness results can only be compared between
studies that use the same training and testing sets.
In order to lead researchers to use the same train-
ing/testing divisions, the Reuters documents have
been specifically tagged, and researchers are en-
couraged to use one of those divisions. In our
experiment we use the “ModApte” split (Lewis,
2004).
In this section, we analize the category sub-
sets, evaluation measures and results obtained in
the past and in the recent years for Reuters-21578
ModApte split.
2.1 Category subsets
Concerning the evaluation of the classification
system, we restrict our attention to the TOPICS
1Categorization problems where there are more than two
possible categories.
group of categories that labels Reuters dataset,
which contains 135 categories. However, many
categories appear in no document and conse-
quently, and because inductive based learning
classifiers learn from training examples, these cat-
egories are not usually considered at evaluation
time. The most widely used subsets are the fol-
lowing:
• Top-10: It is the set of the 10 categories
which have the highest number of documents
in the training set.
• R(90): It is the set of 90 categories which
have at least one document in the training set
and one in the testing set.
• R(115): It is the set of 115 categories which
have at least one document in the training set.
In order to analyze the relative hardness of the
three category subsets, a very recent paper has
been published by Debole and Sebastiani (Debole
andSebastiani,2005)whereasystematic,compar-
ative experimental study has been carried out.
The results of the classification system we pro-
pose are evaluated according to these three cate-
gory subsets.
2.2 Evaluation measures
The evaluation of a text categorization system is
usually done experimentally, by measuring the ef-
fectiveness, i.e. average correctness of the catego-
rization. In binary text categorization, two known
statisticsarewidelyusedtomeasurethiseffective-
ness: precision and recall. Precision (Prec) is the
percentageofdocumentscorrectlyclassifiedintoa
given category, and recall (Rec) is the percentage
of documents belonging to a given category that
are indeed classified into it.
In general, there is a trade-off between preci-
sion and recall. Thus, a classifier is usually evalu-
ated by means of a measure which combines pre-
cision and recall. Various such measures have
been proposed. The breakeven point, the value at
which precision equals recall, has been frequently
used during the past decade. However, it has
been recently criticized by its proposer ((Sebas-
tiani, 2002) footnote 19). Nowadays, the F1 score
is more frequently used. The F1 score combines
recall and precision with an equal weight in the
following way:
F1 = 2 · Prec · RecPrec + Rec
26
Since precision and recall are defined only for
binaryclassificationtasks, formulticlassproblems
results need to be averaged to get a single perfor-
mance value. This will be done using microav-
eraging and macroaveraging. In microaveraging,
which is calculated by globally summing over all
individual cases, categories count proportionally
to the number of their positive testing examples.
In macroaveraging, which is calculated by aver-
aging over the results of the different categories,
all categories count the same. See (Debole and
Sebastiani, 2005; Yang, 1999) for more detailed
explanation of the evaluation measures mentioned
above.
2.3 Comparative Results
Sebastiani (Sebastiani, 2002) presents a table
wherelistsresultsofexperimentsforvarioustrain-
ing/testing divisions of Reuters. Although we are
aware that the results listed are microaveraged
breakeven point measures, and consequently, are
not directly comparable to the ones we present in
this paper, F1, we want to remark some of them.
In Table 1 we summarize the best results reported
for the ModApte split listed by Sebastiani.
Results reported by R(90) Top-10
(Joachims, 1998) 86.4
(Dumais et al., 1998) 87.0 92.0
(Weiss et.al., 1999) 87.8
Table 1: Microaveraged breakeven point results
reported by Sebastiani for the Reuters-21578
ModApte split.
In Table 2 we include some more recent re-
sults, evaluated according to the microaveraged
F1 score. For R(115) there is also a good result,
F1 = 87.2, obtained by (Zhang and Oles, 2001)2.
3 Proposed Approach
In this paper we propose a multiclassifier based
document categorization system. Documents in
the training and testing sets are represented in a
reduced dimensional vector space. Different train-
ingdatabasesaregeneratedfromtheoriginaltrain-
2Actually, this result is obtained for 118 categories which
correspond to the 115 mentioned before and three more cat-
egories which have testing documents but no training docu-
ment assigned.
Results reported by R(90) Top-10
(Gao et al., 2003) 88.42 93.07
(Kim et al., 2005) 87.11 92.21
(Gliozzo and Strapparava, 2005) 92.80
Table 2: F1 results reported for the Reuters-21578
ModApte split.
ing dataset in order to construct the multiclassifier.
We use the k-NN classification algorithm, which
according to each training database makes a pre-
diction for testing documents. Finally, a Bayesian
voting scheme is used in order to definitively as-
sign category labels to testing documents.
In the rest of this section we make a brief re-
view of the SVD dimensionality reduction tech-
nique, the k-NN algorithm and the combination of
classifiers used.
3.1 The SVD Dimensionality Reduction
Technique
TheclassicalVectorSpaceModel(VSM)hasbeen
successfully employed to represent documents in
text categorization tasks. The newer method of
Latent Semantic Indexing (LSI) 3 (Deerwester et
al., 1990) is a variant of the VSM in which doc-
uments are represented in a lower dimensional
space created from the input training dataset. It
is based on the assumption that there is some
underlying latent semantic structure in the term-
document matrix that is corrupted by the wide va-
riety of words used in documents. This is referred
to as the problem of polysemy and synonymy. The
basicideaisthatiftwodocumentvectorsrepresent
two very similar topics, many words will co-occur
on them, and they will have very close semantic
structures after dimension reduction.
The SVD technique used by LSI consists in fac-
toring term-document matrix M into the product
of three matrices, M = UΣV T where Σ is a di-
agonal matrix of singular values in non-increasing
order, andU andV are orthogonal matrices of sin-
gular vectors (term and document vectors, respec-
tively). Matrix M can be approximated by a lower
rank Mp which is calculated by using the p largest
singular values of M. This operation is called
dimensionality reduction, and the p-dimensional
3http://lsi.research.telcordia.com,
http://www.cs.utk.edu/∼lsi
27
space to which document vectors are projected is
called the reduced space. Choosing the right di-
mension p is required for successful application
of the LSI/SVD technique. However, since there
is no theoretical optimum value for it, potentially
expensive experimentation may be required to de-
termine it (Berry and Browne, 1999).
Fordocumentcategorizationpurposes(Dumais,
2004), the testing document q is also projected to
the p-dimensional space, qp = qTUpΣ−1p , and the
cosine is usually calculated to measure the seman-
tic similarity between training and testing docu-
ment vectors.
In Figure 1 we can see an ilustration of the doc-
ument vector projection. Documents in the train-
ing collection are represented by using the term-
document matrix M, and each one of the docu-
ments is represented by a vector in the Rm vec-
tor space like in the traditional vector space model
(VSM)scheme. Afterwards,thedimensionpisse-
lected, and by applying SVD vectors are projected
to the reduced space. Documents in the testing
collection will also be projected to the same re-
duced space.
d1 d2
d3
d4d5
d2
d3
d4d5d6d7
d9603 d1
d9603
d6d7
...
...
Rm
Reuters-21578, ModApte, Training
VSM
SVD
M
Rp
d1 d2 d9603
Mp = UpΣpV Tp
Figure 1: Vectors in the VSM are projected to the
reduced space by using SVD.
3.2 The k nearest neighbor classification
algorithm (k-NN)
k-NN is a distance based classification approach.
Accordingtothisapproach,givenanarbitrarytest-
ing document, the k-NN classifier ranks its near-
est neighbors among the training documents, and
uses the categories of the k top-ranking neighbors
to predict the categories of the testing document
(Dasarathy, 1991). In this paper, the training and
testing documents are represented as reduced di-
mensional vectors in the lower dimensional space,
and in order to find the nearest neighbors of a
given document, we calculate the cosine similar-
ity measure.
In Figure 2 an ilustration of this phase can be
seen, where some training documents and a test-
ing document q are projected in the Rp reduced
space. The nearest to the qp testing document are
considered to be the vectors which have the small-
est angle with qp. According to the category labels
of the nearest documents, a category label predic-
tion, c, will be made for testing document q.
d34
d61
d23
d135
d509
k−NN
Rp
c
qp
Figure 2: The k-NN classifier is applied to qp test-
ing document and c category label is predicted.
We have decided to use the k-NN classifier be-
cause it has been found that on the Reuters-21578
database it performs best among the conventional
methods (Joachims, 1998; Yang, 1999) and be-
cause we have obtained good results in our pre-
vious work on text categorization for documents
writteninBasque, ahighlyinflectedlanguage(Ze-
laia et al., 2005). Besides, the k-NN classification
algorithm can be easily adapted to multilabel cat-
egorization problems such as Reuters.
3.3 Combination of classifiers
The combination of multiple classifiers has been
intensively studied with the aim of improving the
accuracy of individual components (Ho et al.,
1994). Two widely used techniques to implement
this approach are bagging (Breiman, 1996), that
uses more than one model of the same paradigm;
and boosting (Freund and Schapire, 1999), in
which a different weight is given to different train-
ing examples looking for a better accuracy.
In our experiment we have decided to construct
a multiclassifier via bagging. In bagging, a set of
training databasesTDi is generated by selectingn
training examples drawn randomly with replace-
ment from the original training database TD of n
examples. When a set of n1 training examples,
28
n1 < n, is chosen from the original training col-
lection, the bagging is said to be applied by ran-
domsubsampling. Thisistheapproachusedinour
work. The n1 parameter has been selected via tun-
ing. In Section 4.3 the selection will be explained
in a more extended way.
According to the random subsampling, given a
testing document q, the classifier will make a la-
bel prediction ci based on each one of the train-
ing databases TDi. One way to combine the pre-
dictions is by Bayesian voting (Dietterich, 1998),
where a confidence value cvicj is calculated for
each training database TDi and category cj to be
predicted. These confidence values have been cal-
culated based on the original training collection.
Confidence values are summed by category. The
category cj that gets the highest value is finally
proposed as a prediction for the testing document.
In Figure 3 an ilustration of the whole ex-
periment can be seen. First, vectors in the
VSM are projected to the reduced space by using
SVD. Next, random subsampling is applied to the
training database TD to obtain different training
databases TDi. Afterwards the k-NN classifier is
applied for each TDi to make category label pre-
dictions. Finally, Bayesian voting is used to com-
bine predictions, and cj, and in some cases ck as
well, will be the final category label prediction of
the categorization system for testing document q.
In Section 4.3 the cases when a second category
label prediction ck is given are explained.
d1 d2 d9603
d1 d2
d3
d4d5
d2
d3
d4d5d6d7
d9603 d1
Reuters−21578, ModApte, Test
d9603
d6d7
...
q1 q2 q3299
q
q
d34
d61
d23
d135
d509
TD2TD1 TD30
Reuters−21578, ModApte, Train
...
k−NN k−NN
d50
d778
d848d638d256
d98
d2787
d33
d1989
d55
d4612
d9
VSM
VSM
SVD
k−NN
Random
Subsampling
Bayesian voting 
TD
Rm Rm
Rp Rp Rp
Rp
M
Mp=UpΣpVTp
qp=qTUpΣ−1p
c1 c2 c30 cj,(c
k)
qpqpqp
Figure 3: Proposed approach for multilabel docu-
ment categorization tasks.
4 Experimental Setup
Theaimofthis sectionis todescribethedocument
collection used in our experiment and to give an
account of the preprocessing techniques and pa-
rameter settings we have applied.
When machine learning and other approaches
areappliedtotextcategorizationproblems, acom-
mon technique has been to decompose the mul-
ticlass problem into multiple, independent binary
classification problems. In this paper, we adopt a
differentapproach. Wewillbeprimarilyinterested
in a classifier which produces a ranking of possi-
ble labels for a given document, with the hope that
the appropriate labels will appear at the top of the
ranking.
4.1 Document Collection
As previously mentioned, the experiment reported
in this paper has been carried out for the Reuters-
21578dataset4 compiledbyDavidLewisandorig-
inally collected by the Carnegie group from the
Reuters newswire in 1987. We use one of the
most widely used training/testing divisions, the
“ModApte” split, in which 75 % of the documents
(9,603documents)areselectedfortrainingandthe
remaining 25 % (3299 documents) to test the ac-
curacy of the classifier.
Document distribution over categories in both
thetrainingandthetestingsetsisveryunbalanced:
the 10 most frequent categories, top-10, account
75% of the training documents; the rest is dis-
tributed among the other 108 categories.
According to the number of labels assigned to
each document, many of them (19% in training
and 8.48% in testing) are not assigned to any cat-
egory, and some of them are assigned to 12. We
have decided to keep the unlabeled documents in
both the training and testing collections, as it is
suggested in (Lewis, 2004)5.
4.2 Preprocessing
The original format of the text documents is in
SGML. We perform some preprocessing to fil-
ter out the unused parts of a document. We pre-
served only the title and the body text, punctua-
tion and numbers have been removed and all let-
ters have been converted to lowercase. We have
4http://daviddlewis.com/resources/testcollections
5In the ”ModApte” Split section it is suggested as fol-
lows: “If you are using a learning algorithm that requires
each training document to have at least TOPICS category,
you can screen out the training documents with no TOPICS
categories. Please do NOT screen out any of the 3,299 docu-
ments - that will make your results incomparable with other
studies.”
29
used the tools provided in the web6 in order to ex-
tract text and categories from each document. We
have stemmed the training and testing documents
by using the Porter stemmer (Porter, 1980)7. By
usingit, caseandflectioninformationareremoved
from words. Consequently, the same experiment
has been carried out for the two forms of the doc-
ument collection: word-forms and Porter stems.
According to the dimension reduction, we have
created the matrices for the two mentioned doc-
ument collection forms. The sizes of the train-
ing matrices created are 15591 × 9603 for word-
forms and 11114 × 9603 for Porter stems. Differ-
entnumberofdimensionshavebeenexperimented
(p = 100,300,500,700).
4.3 Parameter setting
We have designed our experiment in order to op-
timize the microaveraged F1 score. Based on pre-
vious experiments (Zelaia et al., 2005), we have
set parameter k for the k-NN algorithm to k = 3.
This way, the k-NN classifier will give a category
label prediction based on the categories of the 3
nearest ones.
On the other hand, we also needed to decide
the number of training databases TDi to create. It
has to be taken into account that a high number of
training databases implies an increasing computa-
tional cost for the final classification system. We
decided to create 30 training databases. However,
this is a parameter that has not been optimized.
Therearetwootherparameterswhichhavebeen
tuned: the size of each training database and the
threshold for multilabeling. We now briefly give
some cues about the tuning performed.
4.3.1 The size of the training databases
As we have previously mentioned, documents
have been randomly selected from the original
training database in order to construct the 30 train-
ing databases TDi used in our classification sys-
tem. There are n = 9,603 documents in the orig-
inal Reuters training collection. We had to decide
the number of documents to select in order to con-
struct each TDi. The number of documents se-
lected from each category preserves the propor-
tion of documents in the original one. We have
experimented to select different numbers n1 < n
6http://www.lins.fju.edu.tw/∼tseng/Collections/Reuters-
21578.html
7http://tartarus.org/martin/PorterStemmer/
of documents, according to the following formula:
n1 =
115summationdisplay
i=1
2+ tij , j = 10,20,...,70,
where ti is the total number of training documents
in category i. In Figure 4 it can be seen the vari-
ation of the n1 parameter depending on the value
of parameter j. We have experimented different j
values, and evaluated the results. Based on the re-
sults obtained we decided to select j = 60, which
means that each one of the 30 training databases
will have n1 = 298 documents. As we can see,
the final classification system will be using train-
ingdatabaseswhicharequitesmallerthattheorig-
inal one. This gives a lower computational cost,
and makes the classification system faster.
Parameter n1
Parameter j
 200
 300
 400
 500
 600
 700
 800
 900
 1000
 1100
 10  20  30  40  50  60  70
Figure 4: Random subsampling rate.
4.3.2 Threshold for multilabeling
The k-NN algorithm predicts a unique cate-
gory label for each testing document, based on the
ranked list of categories obtained for each training
database TDi8. As previously mentioned, we use
Bayesian voting to combine the predictions.
TheReuters-21578isamultilabeldatabase, and
therefore, we had to decide in which cases to as-
signasecondcategorylabeltoatestingdocument.
Given thatcj is the category with the highest value
in Bayesian voting and ck the next one, the second
ck category label will be assigned when the fol-
lowing relation is true:
cvck > cvcj × r, r = 0.1,0.2,...,0.9,1
InFigure5wecanseethemeannumberofcate-
gories assigned to a document for different values
8It has to be noted that unlabeled documents have been
preserved, and thus, our classification system treats unlabeled
documents as documents of a new category
30
of r. Results obtained were evaluated and based
on them we decided to select r = 0.4, which cor-
responds to a ratio of 1.05 categories.
Parameter r
Multilabeling Ratio
 0.98
 1
 1.02
 1.04
 1.06
 1.08
 1.1
 1.12
 1.14
 1.16
 0.1  0.2  0.3  0.4  0.5  0.6  0.7  0.8  0.9  1
Figure 5: Threshold for multilabeling.
5 Experimental Results
In Table 3 microaveraged F1 scores obtained
in our experiment are shown. As it could be
expected, a simple stemming process increases
slightlyresults, anditcanbeobservedthatthebest
result for the three category subsets has been ob-
tained for the stemmed corpus, even though gain
is low (less than 0.6).
The evaluation for the Top-10 category subset
gives the best results, reaching up to 93.57%. In
fact,thisistheexpectedbehavior,asthenumberof
categories to be evaluated is small and the number
of documents in each category is high. For this
subset the best result has been obtained for 100
dimensions, although the variation is low among
results for 100, 300 and 500 dimensions. When
using higher dimensions results become poorer.
According to the R(90) and R(115) subsets, the
best results are 87.27% and 87.01% respectively.
Given that the difficulty of these subsets is quite
similar, their behavior is also analogous. As we
can see in the table, most of the best results for
these subsets have been obtained by reducing the
dimension of the space to 500.
6 Conclusions and Future Work
Inthispaperwepresentanapproachformultilabel
document categorization problems which consists
in a multiclassifier system based on the k-NN al-
gorithm. The documents are represented in a re-
duced dimensional space calculated by SVD. We
want to emphasize that, due to the multilabel char-
acter of the database used, we have adapted the
Dimension reduction
Corpus 100 300 500 700
Words(10) 93.06 93.17 93.44 92.00
Porter(10) 93.57 93.20 93.50 92.57
Words(90) 84.90 86.71 87.09 86.18
Porter(90) 85.34 86.64 87.27 86.30
Words(115) 84.66 86.44 86.73 85.84
Porter(115) 85.13 86.47 87.01 86.00
Table 3: Microaveraged F1 scores for Reuters-
21578 ModApte split.
classificationsysteminorderforittobemultilabel
too. The learning of the system has been unique
(9603 training documents) and the category label
predictions made by the classifier have been eval-
uated on the testing set according to the three cat-
egory sets: top-10, R(90) and R(115). The mi-
croaveraged F1 scores we obtain are among the
best reported for the Reuters-21578.
As future work, we want to experiment with
generating more than 30 training databases, and
in a preliminary phase select the best among them.
The predictions made using the selected training
databases will be combined to obtain the final pre-
dictions.
Whenthereisalownumberofdocumentsavail-
able for a given category, the power of LSI gets
limited to create a space that reflects interesting
properties of the data. As future work we want
to include background text in the training col-
lection and use an expanded term-document ma-
trix that includes, besides the 9603 training doc-
uments, some other relevant texts. This may in-
creaseresults,speciallyforthecategorieswithless
documents (Zelikovitz and Hirsh, 2001).
In order to see the consistency of our classi-
fier, we also plan to repeat the experiment for the
RCV1 (Lewis et al., 2004), a new benchmark col-
lection for text categorization tasks which consists
of 800,000 manually categorized newswire stories
recently made available by Reuters.
7 Acknowledgements
This research was supported by the Univer-
sity of the Basque Country (UPV00141.226-T-
15948/2004) and Gipuzkoa Council in a European
31
Union Program.

References
Berry, M.W. and Browne, M.: Understanding Search
Engines: Mathematical Modeling and Text Re-
trieval. SIAM Society for Industrial and Applied
Mathematics, ISBN: 0-89871-437-0, Philadelphia,
(1999)

Breiman, L.: Bagging Predictors. Machine Learning,
24(2), 123–140, (1996)

Cristianini, N., Shawe-Taylor, J. and Lodhi, H.: Latent
Semantic Kernels. Proceedings of ICML’01, 18th
International Conference on Machine Learning, 66–
73, Morgan Kaufmann Publishers, (2001)

Dasarathy, B.V.: Nearest Neighbor (NN) Norms:
NN Pattern Recognition Classification Techniques.
IEEE Computer Society Press, (1991)

Debole, F. and Sebastiani, F.: An Analysis of the Rela-
tive Hardness of Reuters-21578 Subsets. Journal of
the American Society for Information Science and
Technology, 56(6),584–596, (2005)

Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer,
T.K. and Harshman, R.: Indexing by Latent Seman-
tic Analysis. Journal of the American Society for
Information Science, 41, 391–407, (1990)

Dietterich, T.G.: Machine-Learning Research: Four
Current Directions. The AI Magazine, 18(4), 97–
136, (1998)

Dumais, S.T., Platt, J., Heckerman, D. and Sahami,
M.: Inductive Learning Algorithms and Repre-
sentations for Text Categorization. Proceedings of
CIKM’98: 7th International Conference on Infor-
mation and Knowledge Management, ACM Press,
148–155 (1998)

Dumais, S.: Latent Semantic Analysis. ARIST, An-
nualReviewofInformationScienceTechnology, 38,
189–230, (2004)

Freund, Y. and Schapire, R.E.: A Short Introduction to
Boosting. Journal of Japanese Society for Artificial
Intelligence, 14(5), 771-780, (1999)

Gao, S., Wu, W., Lee, C.H. and Chua, T.S.: A Maxi-
malFigure-of-MeritLearningApproachtoTextCat-
egorization. Proceedings of SIGIR’03: 26th An-
nual International ACM SIGIR Conference on Re-
search and Development in Information Retrieval,
174–181, ACM Press, (2003)

Gliozzo, A. and Strapparava, C.: Domain Kernels
for Text Categorization. Proceedings of CoNLL’05:
9thConferenceonComputational NaturalLanguage
Learning, 56–63, (2005)

Ho, T.K., Hull, J.J. and Srihari, S.N.: Decision Combi-
nation in Multiple Classifier Systems. IEEE Trans-
actions on Pattern Analysis and Machine Intelli-
gence, 16(1), 66–75, (1994)

Joachims, T. Text Categorization with Support Vector
Machines: Learning with Many Relevant Features.
Proceedings of ECML’98: 10th European Confer-
enceonMachineLearning,Springer1398,137–142,
(1998)

Kim, H., Howland, P. and Park, H.: Dimension Re-
duction in Text Classification with Support Vector
Machines. Journal of Machine Learning Research,
6, 37–53, MIT Press, (2005)

Lewis, D.D.: Reuters-21578 Text Catego-
rization Test Collection, Distribution 1.0.
http://daviddlewis.com/resources/testcollections
README file (v 1.3), (2004)

Lewis, D.D., Yang, Y., Rose, T.G. and Li, F.: RCV1: A
New Benchmark Collection for Text Categorization
Research. Journal of Machine Learning Research,
5, 361–397, (2004)

Porter, M.F.: An Algorithm for Suffix Stripping. Pro-
gram, 14(3), 130–137, (1980)

Salton, G. and McGill, M.: Introduction to Modern
Information Retrieval. McGraw-Hill, New York,
(1983)

Sebastiani, F.: Machine Learning in Automated Text
Categorization. ACM Computing Surveys, 34(1),
1–47, (2002)

Weiss, S.M., Apte, C., Damerau, F.J., Johnson, D.E.,
Oles, F.J., Goetz, T. and Hampp, T.: Maximizing
Text-Mining Performance. IEEE Intelligent Sys-
tems, 14(4),63–69, (1999)

Yang, Y. An Evaluation of Statistical Approaches to
Text Categorization. Journal of Information Re-
trieval. Kluwer Academic Publishers, 1,(1/2), 69–
90, (1999)

Zelaia, A., Alegria, I., Arregi, O. and Sierra, B.: An-
alyzing the Effect of Dimensionality Reduction in
Document Categorization for Basque. Proceedings
of L&TC’05: 2nd Language & Technology Confer-
ence, 72–75, (2005)

Zelikovitz, S. and Hirsh, H.: Using LSI for Text
Classification in the Presence of Background Text.
Proceedings of CIKM’01: 10th ACM International
Conference on Information and Knowledge Man-
agement, ACM Press, 113–118, (2001)

Zhang, T. and Oles, F.J.: Text Categorization Based
on Regularized Linear Classification Methods. In-
formation Retrieval, 4(1): 5–31, Kluwer Academic
Publishers, (2001).
