Proceedings of the Workshop on Task-Focused Summarization and Question Answering, pages 1–7,
Sydney, July 2006. c©2006 Association for Computational Linguistics
Dimensionality Reduction Aids Term Co-Occurrence Based
Multi-Document Summarization
Ben Hachey, Gabriel Murray & David Reitter
School of Informatics
University of Edinburgh
2 Buccleuch Place, Edinburgh EH8 9LW
bhachey@inf.ed.ac.uk, gabriel.murray@ed.ac.uk, dreitter@inf.ed.ac.uk
Abstract
A key task in an extraction system for
query-oriented multi-document summari-
sation, necessary for computing relevance
and redundancy, is modelling text seman-
tics. In the Embra system, we use a repre-
sentation derived from the singular value
decomposition of a term co-occurrence
matrix. We present methods to show the
reliability of performance improvements.
We find that Embra performs better with
dimensionality reduction.
1 Introduction
We present experiments on the task of query-
oriented multi-document summarisation as ex-
plored in the DUC 2005 and DUC 2006 shared
tasks, which aim to model real-world complex
question-answering. Input consists of a detailed
query1 and a set of 25 to 50 relevant docu-
ments. We implement an extractive approach
where pieces of the original texts are selected to
form a summary and then smoothing is performed
to create a discursively coherent summary text.
The key modelling task in the extraction phase
of such a system consists of estimating responsive-
ness to the query and avoiding redundancy. Both
of these are often approached through some tex-
tual measure of semantic similarity. In the Embra2
system, we follow this approach in a sentence ex-
traction framework. However, we model the se-
mantics of a sentence using a very large distri-
butional semantics (i.e. term co-occurrence) space
reduced by singular value decomposition. Our hy-
1On average, queries contain approximately 34 words
words and three sentences.
2Edinburgh Multi-document Breviloquence Assay
pothesis is that this dimensionality reduction us-
ing a large corpus can outperform a simple term
co-occurrence model.
A number of papers in the literature look at sin-
gular value decomposition and compare it to unre-
duced term × document or term co-occurrence
matrix representations. These explore varied tasks
and obtain mixed results. For example, Peder-
sen et al. (2005) find that SVD does not improve
performance in a name discrimination task while
Matveeva et al. (2005) and Rohde et al. (In prep)
find that dimensionality reduction with SVD does
help on word similarity tasks.
The experiments contained herein investigate
the contribution of singular value decomposition
on the query-oriented multi-document summarisa-
tion task. We compare the singular value decom-
position of a term co-occurrence matrix derived
from a corpus of approximately 100 million words
(DS+SVD) to an unreduced version of the matrix
(DS). These representations are described in Sec-
tion 2. Next, Section 3 contains a discussion of
related work using SVD for summarisation and a
description of the sentence selection component in
the Embra system. The paper goes on to give an
overview of the experimental design and results in
Section 4. This includes a detailed analysis of the
statistical significance of the results.
2 Representing Sentence Semantics
The following three subsections discuss various
ways of representing sentence meaning for infor-
mation extraction purposes. While the first ap-
proach relies solely on weighted term frequencies
in a vector space, the subsequent methods attempt
to use term context information to better represent
the meanings of sentences.
1
2.1 Terms and Term Weighting (TF.IDF)
The traditional model for measuring semantic sim-
ilarity in information retrieval and text mining is
based on a vector representation of the distribution
of terms in documents. Within the vector space
model, each term is assigned a weight which sig-
nifies the semantic importance of the term. Often,
tf.idf is used for this weight, which is a scheme
that combines the importance of a term within the
current document3 and the distribution of the term
across the text collection. The former is often
represented by the term frequency and the latter
by the inverse document frequency (idfi = Ndfi ),
where N is the number of documents and dfi is
the number of documents containing term ti.
2.2 Term Co-occurrence (DS)
Another approach eschews the traditional vector
space model in favour of the distributional seman-
tics approach. The DS model is based on the in-
tuition that two words are semantically similar if
they appear in a similar set of contexts. We can
obtain a representation of a document’s semantics
by averaging the context vectors of the document
terms. (See Besanc¸on et al. (1999), where the DS
model is contrasted with a term × document vec-
tor space representation.)
2.3 Singular Value Decomposition
(DS+SVD)
Our third approach uses dimensionality reduction.
Singular value decomposition is a technique for
dimensionality reduction that has been used ex-
tensively for the analysis of lexical semantics un-
der the name of latent semantic analysis (Landauer
et al., 1998). Here, a rectangular (e.g., term ×
document) matrix is decomposed into the product
of three matrices (Xw×p = Ww×nSn×n(Pp×n)T)
with n ‘latent semantic’ dimensions. W and P
represent terms and documents in the new space.
And S is a diagonal matrix of singular values in
decreasing order.
Taking the product Ww×kSk×k(Pp×k)T over
the first k columns gives the best least square ap-
proximation of the original matrix X by a matrix
of rank k, i.e. a reduction of the original matrix to
k dimensions. Similarity between documents can
then be computed in the space obtained by taking
the rank k product of S and P.
3The local importance of a term can also be computed
over other textual units, e.g. sentence in extractive summari-
sation or the context of an entity pair in relation discovery.
This decomposition abstracts away from terms
and can be used to model a semantic similarity
that is more linguistic in nature. Furthermore, it
has been successfully used to model human intu-
itions about meaning. For example, Landauer et
al. (1998) show that latent semantic analysis cor-
relates well with human judgements of word sim-
ilarity and Foltz (1998) shows that it is a good es-
timator for textual coherence.
It is hoped that these latter two techniques (di-
mensionality reduction and the DS model) will
provide for a more robust representation of term
contexts and therefore better representation of sen-
tence meaning, enabling us to achieve more reli-
able sentence similarity measurements for extrac-
tive summarisation.
3 SVD in Summarisation
This section describes ways in which SVD has
been used for summarisation and details the im-
plementation in the Embra system.
3.1 Related Work
In seminal work by Gong and Liu (2001), the au-
thors proposed that the rows of PT may be re-
garded as defining topics, with the columns rep-
resenting sentences from the document. In their
SVD method, summarisation proceeds by choos-
ing, for each row in PT, the sentence with the
highest value. This process continues until the de-
sired summary length is reached.
Steinberger and Jeˇzek (2004) have offered two
criticisms of the Gong and Liu approach. Firstly,
the method described above ties the dimension-
ality reduction to the desired summary length.
Secondly, a sentence may score highly but never
“win” in any dimension, and thus will not be ex-
tracted despite being a good candidate. Their solu-
tion is to assign each sentence an SVD-based score
using:
ScSVDi =
radicaltpradicalvertex
radicalvertexradicalbt nsummationdisplay
i=1
v(i,k)2 ∗σ(k)2 ,
where v(i,k) is the kth element of the ith sen-
tence vector and σ(k) is the corresponding singu-
lar value.
Murray et al. (2005a) address the same concerns
but retain the Gong and Liu framework. Rather
than extracting the best sentence for each topic,
the n best sentences are extracted, with n deter-
mined by the corresponding singular values from
2
matrix S. Thus, dimensionality reduction is no
longer tied to summary length and more than one
sentence per topic can be chosen.
A similar approach in DUC 2005 using term
co-occurrence models and SVD was presented by
Jagarlamudi et al. (2005). Their system performs
SVD over a term × sentence matrix and combines
a relevance measurement based on this representa-
tion with relevance based on a term co-occurrence
model by a weighted linear combination.
3.2 Sentence Selection in Embra
The Embra system developed for DUC 2005 at-
tempts to derive more robust representations of
sentences by building a large semantic space us-
ing SVD on a very large corpus. While researchers
have used such large semantic spaces to aid in au-
tomatically judging the coherence of documents
(Foltz et al., 1998; Barzilay and Lapata, 2005), to
our knowledge this is a novel technique in sum-
marisation.
Using a concatenation of Aquaint and DUC
2005 data (100+ million words), we utilised the
Infomap tool4 to build a semantic model based on
singular value decomposition (SVD). The decom-
position and projection of the matrix to a lower-
dimensionality space results in a semantic model
based on underlying term relations. In the current
experiments, we set dimension of the reduced rep-
resentation to 100. This is a reduction of 90% from
the full dimensionality of 1000 content-bearing
terms in the original DS matrix. This was found
to perform better than 25, 50, 250 and 500 dur-
ing parameter optimisation. A given sentence is
represented as a vector which is the average of its
constituent word vectors. This sentence represen-
tation is then fed into an MMR-style algorithm.
MMR (Maximal Marginal Relevance) is a com-
mon approach for determining relevance and re-
dundancy in multi-document summarisation, in
which candidate sentences are represented as
weighted term-frequency vectors which can thus
be compared to query vectors to gauge similarity
and already-extracted sentence vectors to gauge
redundancy, via the cosine of the vector pairs
(Carbonell and Goldstein, 1998). While this has
proved successful to a degree, the sentences are
represented merely according to weighted term
frequency in the document, and so two similar sen-
tences stand a chance of not being considered sim-
4http://infomap.stanford.edu/
for each sentence in document:
for each word in sentence:
get word vector from semantic model
average word vectors to form sentence vector
sim1 = cossim(sentence vector, query vector)
sim2 = highest(cossim(sentence vector, all extracted vectors))
score = λ*sim1 - (1-λ)*sim2
extract sentence with highest score
repeat until desired length
Figure 1: Sentence extraction algorithm
ilar if they do not share the same terms.
Our implementation of MMR (Figure 1) uses λ
annealing following (Murray et al., 2005a). λ de-
creases as the summary length increases, thereby
emphasising relevance at the outset but increas-
ingly prioritising redundancy removal as the pro-
cess continues.
4 Experiment
The experimental setup uses the DUC 2005 data
(Dang, 2005) and the Rouge evaluation met-
ric to explore the hypothesis that query-oriented
multi-document summarisation using a term co-
occurrence representation can be improved using
SVD. We frame the research question as follows:
Does SVD dimensionality reduction
lead to an increase in Rouge score com-
pared to the DS representation?
4.1 Materials
The DUC 2005 task5 was motivated by Amigo et
al.’s (2004) suggestion of evaluations that model
real-world complex question answering. The goal
is to synthesise a well-organised, fluent answer of
no more than 250 words to a complex question
from a set of 25 to 50 relevant documents. The
data includes a detailed query, a document set, and
at least 4 human summaries for each of 50 topics.
The preprocessing was largely based on LT TTT
and LT XML tools (Grover et al., 2000; Thomp-
son et al., 1997). First, we perform tokenisation
and sentence identification. This is followed by
lemmatisation.
At the core of preprocessing is the LT TTT
program fsgmatch, a general purpose transducer
which processes an input stream and adds annota-
tions using rules provided in a hand-written gram-
mar file. We also use the statistical combined part-
of-speech (POS) tagger and sentence boundary
disambiguation module from LT TTT (Mikheev,
5http://www-nlpir.nist.gov/projects/
duc/duc2005/tasks.html
3
1997). Using these tools, we produce an XML
markup with sentence and word elements. Further
linguistic markup is added using the morpha lem-
matiser (Minnen et al., 2000) and the C&C named
entity tagger (Curran and Clark, 2003) trained on
the data from MUC-7.
4.2 Methods
The different system configurations (DS,
DS+SVD, TF.IDF) were evaluated against
the human upper bound and a baseline using
Rouge-2 and Rouge-SU4. Rouge estimates the
coverage of appropriate concepts (Lin and Hovy,
2003) in a summary by comparing it several
human-created reference summaries. Rouge-2
does so by computing precision and recall based
on macro-averaged bigram overlap. Rouge-SU4
allows bigrams to be composed of non-contiguous
words, with as many as four words intervening.
We use the same configuration as the official DUC
2005 evaluation,6 which is based on word stems
(rather than full forms) and uses jackknifing (k−1
cross-evaluation) so that human gold-standard and
automatic system summaries can be compared.
The independent variable in the experiment is
the model of sentence semantics used by the sen-
tence selection algorithm. We are primarily inter-
ested in the relative performance of the DS and
DS+SVD representations. As well as this, we
include the DUC 2005 baseline, which is a lead
summary created by taking the first 250 words of
the most recent document for each topic. We also
include a tf.idf -weighted term × sentence repre-
sentation (TF.IDF) for comparison with a conven-
tional MMR approach.7 Finally, we include an up-
per bound calculated using the DUC 2005 human
reference summaries. Preprocessing and all other
aspects of the sentence selection algorithm remain
constant over all systems.
In general, Rouge shows a large variance across
data sets (and so does system performance). It is
important to test whether obtained nominal differ-
ences are due to chance or are actually statistically
significant.
To test whether the Rouge metric showed a re-
liably different performance for the systems, the
6i.e. ROUGE-1.5.5.pl -n 2 -x -m -2 4 -u
-c 95 -r 1000 -f A -p 0.5 -t 0 d
7Specifically, we use tfi,j ∗ log( N
dfi ) for term weighting
where tfi,j is the number of times term i occurs in sentence
j, N is the number of sentences, and dfi is the number of
sentences containing term i.
p Metric hypothesis
0.000262 Rouge-2 base<TF.IDF ***
0.021640 Rouge-2 base<DS *
0.000508 Rouge-2 base<DS+SVD ***
0.014845 Rouge-2 DS<TF.IDF *
0.507702 Rouge-2 TF.IDF<DS+SVD
0.047016 Rouge-2 DS<DS+SVD *
0.000080 Rouge-SU4 base<TF.IDF ***
0.006803 Rouge-SU4 base<DS **
0.000006 Rouge-SU4 base<DS+SVD ***
0.012815 Rouge-SU4 DS<TF.IDF *
0.320083 Rouge-SU4 TF.IDF<DS+SVD
0.001053 Rouge-SU4 DS<DS+SVD **
Table 1: Holm-corrected Wilcoxon hypothesis test
results.
Friedman rank sum test (Friedman, 1940; Demˇsar,
2006) can be used. This is a hypothesis test not
unlike an ANOVA, however, it is non-parametric,
i.e. it does not assume a normal distribution of
the measures (i.e. precision, recall and F-score).
More importantly, it does not require homogene-
ity of variances.
To (partially) rank the systems against each
other, we used a cascade of Wilcoxon signed ranks
tests. These tests are again non-parametric (as they
rank the differences between the system results for
the datasets). As discussed by Demˇsar (2006), we
used Holm’s procedure for multiple tests to correct
our error estimates (p).
4.3 Results
Friedman tests for each Rouge metric (with
F-score, precision and recall included as ob-
servations, with the dataset as group) showed
a reliable effect of the system configuration
(χ2F,SU4 = 106.6, χ2P,SU4 = 96.1,
χ2R,SU4 = 105.5, all p < 0.00001).
Post-hoc analysis (Wilcoxon) showed (see Ta-
ble 1) that all three systems performed reliably
better than the baseline. TF.IDF performed bet-
ter than simple DS in Rouge-2 and Rouge-SU4.
DS+SVD performed better than DS (p2 < 0.05,
pSU4 < 0.005). There is no evidence to support
a claim that DS+SVD performed differently from
TF.IDF.
However, when we specifically compared the
performance of TF.IDF and DS+SVD with the
Rouge-SU4 F score for only the specific (as
opposed to general) summaries, we found that
DS+SVD scored reliably, but only slightly better
4
baseline
TF.IDF MMR
DS MMR
DS+SVD MMR
Human
Mean Rouge F−Scores          
0.00 0.05 0.10 0.15
Figure 2: Mean system performance over 50
datasets (F-scores). Precision and Recall look
qualitatively similar.
(Wilcoxon, p<0.05). This result is unadjusted,
and post-hoc comparisons with other scores or for
the general summaries did not show reliable dif-
ferences.
Having established the reliable performance im-
provement of DS+SVD over DS, it it important
to take the effect size into consideration (with
enough data, small effects may be statistically sig-
nificant, but practically unimportant). Figure 2 il-
lustrates that the gain in mean performance is sub-
stantial. If the mean Rouge-SU4 score for human
performance is seen as upper bound, the DS+SVD
system showed a 25.4 percent reduction in error
compared to the DS system.8
A similar analysis for precision and recall gives
qualitatively comparable results.
5 Discussion and Future Work
The positive message from the experimental re-
sults is that SVD dimensionality reduction im-
proves performance over a term co-occurrence
model for computing relevance and redundancy in
a MMR framework. We note that we cannot con-
clude that the DS or DS+SVD systems outper-
form a conventional tf.idf -weighted term × sen-
tence representation on this task. However, results
from Jagarlamudi et al. (2005) suggest that the DS
and term × sentence representations may be com-
plementary in which case we would expect a fur-
ther improvement through an ensemble technique.
Previous results comparing SVD with unre-
duced representations show mixed results. For
example, Pedersen et al. (2005) experiment with
term co-occurrence representations with and with-
out SVD on a name discrimination task and find
8Pairwise effect size estimates over datasets aren’t sensi-
ble. Averaging of differences between pairs was affected by
outliers, presumably caused by Rouge’s error distribution.
that the unreduced representation tends to perform
better. Rohde et al. (In prep), on the other hand,
find that a reduced matrix does perform better on
word pair similarity and multiple-choice vocabu-
lary tests. One crucial factor here may be the size
of the corpus. SVD may not offer any reliable ‘la-
tent semantic’ advantage when the corpus is small,
in which case the efficiency gain from dimension-
ality reduction is less of a motivation anyway.
We plan to address the question of corpus size
in future work by comparing DS and DS+SVD
derived from corpora of varying size. We hypoth-
esise that the larger the corpus used to compile
the term co-occurrence information, the larger the
potential contribution from dimensionality reduc-
tion. This will be explored by running the experi-
ment described in this paper a number of times us-
ing corpora of different sizes (e.g. 0.5m, 1m, 10m
and 100m words).
Unlike official DUC evaluations, which rely on
human judgements of readability and informative-
ness, our experiments rely solely on Rouge n-
gram evaluation metrics. It has been shown in
DUC 2005 and in work by Murray et al. (2005b;
2006) that Rouge does not always correlate well
with human evaluations, though there is more sta-
bility when examining the correlations of macro-
averaged scores. Rouge suffers from a lack of
power to discriminate between systems whose per-
formance is judged to differ by human annotators.
Thus, it is likely that future human evaluations
would be more informative. Another way that the
evaluation issue might be addressed is by using an
annotated sentence extraction corpus. This could
proceed by comparing gold standard alignments
between abstract and full document sentences with
predicted alignments using correlation analysis.
6 Conclusions
We have presented experiments with query-
oriented multi-document summarisation. The ex-
periments explore the question of whether SVD
dimensionality reduction offers any improvement
over a term co-occurrence representation for sen-
tence semantics for measuring relevance and re-
dundancy. While the experiments show that
our system does not outperform a term × sen-
tence tf.idf system, we have shown that the SVD
reduced representation of a term co-occurrence
space built from a large corpora performs better
than the unreduced representation. This contra-
5
dicts related work where SVD did not provide
an improvement over unreduced representations
on the name discrimination task (Pedersen et al.,
2005). However, it is compatible with other work
where SVD has been shown to help on the task
of estimating human notions of word similarity
(Matveeva et al., 2005; Rohde et al., In prep).
A detailed analysis using the Friedman test and
a cascade of Wilcoxon signed ranks tests suggest
that our results are statistically valid despite the
unreliability of the Rouge evaluation metric due to
its low variance across systems.
Acknowledgements
This work was supported in part by Scottish Enter-
prise Edinburgh-Stanford Link grant R36410 and,
as part of the EASIE project, grant R37588. It
was also supported in part by the European Union
6th FWP IST Integrated Project AMI (Augmented
Multiparty Interaction, FP6-506811, publication).
We would like to thank James Clarke for de-
tailed comments and discussion. We would also
like to thank the anonymous reviewers for their
comments.

References
Enrique Amigo, Julio Gonzalo, Victor Peinado,
Anselmo Penas, and Felisa Verdejo. 2004. An
empirical study of information synthesis tasks. In
Proceedings of the 42nd Annual Meeting of the As-
sociation for Computational Linguistics, Barcelona,
Spain.
Regina Barzilay and Mirella Lapata. 2005. Modeling
local coherence: an entity-based approach. In Pro-
ceedings of the 43rd Annual Meeting of the Associa-
tion for Computational Linguistics, Ann Arbor, MI,
USA.
Romaric Besanc¸on, Martin Rajman, and Jean-C´edric
Chappelier. 1999. Textual similarities based on
a distributional approach. In Proceedings of the
10th International Workshop on Database And Ex-
pert Systems Applications, Firenze, Italy.
Jaime G. Carbonell and Jade Goldstein. 1998. The
use of mmr, diversity-based reranking for reordering
documents and producing summaries. In Proceed-
ings of the 21st Annual International ACM SIGIR
Conference on Research and Development in Infor-
mation Retrieval, Melbourne, Australia.
James R. Curran and Stephen Clark. 2003. Language
independent NER using a maximum entropy tag-
ger. In Proceedings of the 2003 Conference on Com-
putational Natural Language Learning, Edmonton,
Canada.
Hoa T. Dang. 2005. Overview of DUC 2005. In
Proceedings of the Document Understanding Con-
ference, Vancouver, B.C., Canada.
Janez Demˇsar. 2006. Statistical comparisons of clas-
sifiers over multiple data sets. Journal of Machine
Learning Research, 7:1–30, Jan.
Peter W. Foltz, Walter Kintsch, and Thomas K. Lan-
dauer. 1998. The measurement of textual coherence
with latent semantic analysis. Discourse Processes,
25.
Milton Friedman. 1940. A comparison of alternative
tests of significance for the problem of m rankings.
The Annals of Mathematical Statistics, 11:86–92.
Yihon Gong and Xin Liu. 2001. Generic text summa-
rization using relevance measure and latent semantic
analysis. In Proceedings of the 24th Annual Interna-
tional ACM SIGIR Conference on Research and De-
velopment in Information Retrieval, New Orleans,
LA, USA.
Claire Grover, Colin Matheson, Andrei Mikheev, and
Marc Moens. 2000. LT TTT—a flexible tokeni-
sation tool. In Proceedings of the 2nd International
Conference on Language Resources and Evaluation,
Athens, Greece.
Ben Hachey and Claire Grover. 2004. A rhetorical
status classifier for legal text summarisation. In
Proceedings of the ACL-2004 Text Summarization
Branches Out Workshop, Barcelona, Spain.
Jagadeesh Jagarlamudi, Prasad Pingali, and Vasudeva
Varma. 2005. A relevance-based language mod-
eling approach to DUC 2005. In Proceedings of
the Document Understanding Conference, Vancou-
ver, B.C., Canada.
Thomas K. Landauer, Peter W. Foltz, and Darrell La-
ham. 1998. Introduction to latent semantic analysis.
Discourse Processes, 25.
Chin-Yew Lin and Eduard H. Hovy. 2003. Au-
tomatic evaluation of summaries using n-gram
co-occurrence statistics. In Proceedings of the
Joint Human Language Technology Conference and
North American Chapter of the Association for
Computational Linguistics Annual Meeting, Edmon-
ton, Alberta, Canada.
Irina Matveeva, Gina-Anne Levow, Ayman Farahat,
and Christiaan Royer. 2005. Term represetation
with generalized latent semantic analysis. In Pro-
ceedings of the 2005 Conference on Recent Ad-
vances in Natural Language Processing, Borovets,
Bulgaria.
Andrei Mikheev. 1997. Automatic rule induction for
unknown word guessing. Computational Linguis-
tics, 23(3).
Guido Minnen, John Carroll, and Darren Pearce. 2000.
Robust, applied morphological generation. In Pro-
ceedings of the 1st International Natural Language
Generation Conference, Mitzpe Ramon, Israel.
Gabriel Murray, Steve Renals, and Jean Carletta.
2005a. Extractive summarization of meeting record-
ings. In Proceedings of the 9th European Con-
ference on Speech Communication and Technology,
Lisbon, Portugal.
Gabriel Murray, Steve Renals, Jean Carletta, and Jo-
hanna Moore. 2005b. Evaluating automatic sum-
maries of meeting recordings. In Proceedings of the
43rd Annual Meeting of the Association for Compu-
tational Linguistics, Ann Arbor, MI, USA.
Gabriel Murray, Steve Renals, Jean Carletta, and Jo-
hanna Moore. 2006. Incorporating speaker and
discourse features into speech summarization. In
Proceedings of the Joint Human Language Technol-
ogy Conference and North American Chapter of the
Association for Computational Linguistics Annual
Meeting, New York City, NY, USA.
Ted Pedersen, Amruta Purandare, and Anagha Kulka-
rni. 2005. Name discrimination by clustering simi-
lar contexts. In Proceedings of the 6th International
Conference on Intelligent Text Processing and Com-
putational Linguistics, Mexico City, Mexico.
Douglas L. T. Rohde, Laur M. Gonnerman, and
David C. Plaut. In prep. An improved
method for deriving word meaning from lexical
co-occurrence. http://dlt4.mit.edu/˜dr/
COALS/Coals.pdf (1 May 2006).
Josef Steinberger and Karel Jeˇzek. 2004. Using latent
semantic analysis in text summarization and sum-
mary evaluation. In Proceedings of the 5th Inter-
national Conference on Information Systems Imple-
mentation and Modelling, Ostrava, Czech Republic.
Henry Thompson, Richard Tobin, David McK-
elvie, and Chris Brew. 1997. LT XML:
Software API and toolkit for XML processing.
http://www.ltg.ed.ac.uk/software/.
