Hierarchical Orderings of Textual Units
Alexander Mehler
University of Trier
Universit¨atsring 15
D-54286 Trier, Germany
mehler@uni-trier.de
Abstract
Text representation is a central task for any ap-
proach to automatic learning from texts. It re-
quires a format which allows to interrelate texts
even if they do not share content words, but
deal with similar topics. Furthermore, measur-
ing text similarities raises the question of how
to organize the resulting clusters. This paper
presents cohesion trees (CT) as a data structure
for the perspective, hierarchical organization of
text corpora. CTs operate on alternative text
representation models taking lexical organiza-
tion, quantitative text characteristics, and text
structure into account. It is shown that CTs
realize text linkages which are lexically more
homogeneous than those produced by minimal
spanning trees.
1 Introduction
Text representation is a central task for ap-
proaches to text classification or categorization.
They require a format which allows to seman-
tically relate words, texts, and thematic cate-
gories. The majority of approaches to automa-
tic learning from texts use the vector space or
bag of words model. Although there is much re-
search for alternative formats, whether phrase-
or hyperonym-based, their effects seem to be
small (Scott and Matwin, 1999). More serious-
ly (Riloff, 1995) argues that the bag of words
model ignores morphological and syntactical in-
formation which she found to be essential for
solving some categorization tasks. An alterna-
tive to the vector space model are semantic
spaces, which have been proposed as a high-
dimensional format for representing relations of
semantic proximity. Relying on sparse know-
ledge resources, they prove to be efficient in cog-
nitive science (Kintsch, 1998; Landauer and Du-
mais, 1997), computational linguistics (Rieger,
1984; Sch¨utze, 1998), and information retrieval.
Although semantic spaces prove to be an al-
ternative to the vector space model, they leave
the question unanswered of how to explore and
visualize similarities of signs mapped onto them.
In case that texts are represented as points in
semantic space, this question refers to the ex-
ploration of their implicit, content based rela-
tions. Several methods for solving this task have
been proposed which range from simple lists via
minimal spanning trees to cluster analysis as
part of scatter/gahter algorithms (Hearst and
Pedersen, 1996). Representing a sign’s environ-
ment in space by means of lists runs the risk of
successively ordering semantically or themati-
cally diverse units. Obviously, lists neglect the
poly-hierarchical structure of semantic spaces
which may induce divergent thematic progres-
sions starting from the same polysemous unit.
Although clustering proves to be an alternative
to lists, it seeks a global, possibly nested par-
tition in which clusters represent sets of indis-
tinguishable objects regarding the cluster crite-
rion. In contrast to this, we present cohesion
trees (CT) as a data structure, in which single
objects are hierarchically ordered on the basis of
lexical cohesion. CTs, whose field of application
is the management of search results in IR, shift
the perspective from sets of clustered objects to
cohesive paths of interlinked signs.
The paper is organized as follows: the next
section presents alternative text representation
models as extensions of the semantic space ap-
proach. They are used in section (3) as a back-
ground of the discussion of cohesion trees. Both
types of models, i.e. the text representation
models and cohesion trees as a tool for hierarchi-
cally traversing semantic spaces, are evaluated
in section (4). Finally, section (5) gives some
conclusions and prospects future work.
2 Numerical Text Representation
This paper uses semantic spaces as a format
for text representation. Although it neglects
sentence as well as rhetorical structure, it de-
parts from the bag of words model by refer-
ring to paradigmatic similarity as the funda-
mental feature type: instead of measuring in-
tersections of lexical distributions, texts are in-
terrelated on the basis of the paradigmatic regu-
larities of their constituents. A coordinate value
of a feature vector of a sign mapped onto se-
mantic space measures the extent to which this
sign (or its constituents in case of texts) shares
paradigmatic usage regularities with the word
defining the corresponding dimension. Because
of this sensitivity to paradigmatics, semantic
spaces can capture indirect meaning relations:
words can be linked even if they never co-occur,
but tend to occur in similar contexts. Further-
more, texts can be linked even if they do not
share content words, but deal with similar top-
ics (Landauer and Dumais, 1997). Using this
model as a starting point, we go a step further
in departing from the bag of words model by
taking quantitative characteristics of text struc-
ture into account (see below).
Semantic spaces focus on meaning as use
as described by the weak contextual hypothe-
sis (Miller and Charles, 1991), which says that
the similarity of contextual representations of
words contributes to their semantic similarity.
Regarding the level of texts, reformulating this
hypothesis is straightforward:
Contextual hypothesis for texts: the contextual
similarity of the lexical constituents of two texts
contributes to their semantic similarity.
In other words: the more two texts share se-
mantically similar words, the higher the proba-
bility that they deal with similar topics. Clearly,
this hypothesis does not imply that texts having
contextually similar components to a high de-
gree also share propositional content. It is the
structural (connotative), not the propositional
(denotative) meaning aspect to which this hy-
pothesis applies. Moreover, this version of the
contextual hypothesis neglects the structural di-
mension of similarity relations: not only that
a text is structured into thematic components,
each of which may semantically relate to differ-
ent units, but units similar to the text as a whole
do not form isolated, unstructured clumps. Ne-
glecting the former we focus on the latter phe-
nomenon, which demands a supplementary hy-
pothesis:
Structure sensitive contextual hypothesis: units,
which are similar to a text according to the con-
textual hypothesis, contribute to the structur-
ing of its meaning.
Since we seek a model for automatic text rep-
resentation for which nonlinguistic context is
inaccessible, we limit contextual similarity to
paradigmatic similarity. On this basis the latter
two hypotheses can be summarized as follows:
Definition 1. Let C be a corpus in which we
observe paradigmatic regularities of words. The
textual connotation of a text x with respect to
C includes those texts of C, whose constituents
realize similar paradigmatic regularities as the
lexical constituents of x. The connotation of x is
structured on the basis of the same relation of
(indirect) paradigmatic similarity interrelating
the connoted texts.
In order to model this concept of structured
connotation, we use the space model M0 of
(Rieger, 1984) as a point of departure and de-
rive three text representation models M1, M2,
M3. Since M0 only maps words onto semantic
space we extend it in order to derive meaning
points of texts. This is done as follows:
M0 analyses word meanings as the result of
a two-stage process of unsupervised learning. It
builds a lexical semantic space by modeling syn-
tagmatic regularities with a correlation coeffi-
cient α: W → C ⊂ a82n and their differences with
an Euclidean metric δ: C → S ⊂ a82n, where W
is the set of words, C is called corpus space repre-
senting syntagmatic regularities, and S is called
semantic space representing paradigmatic regu-
larities. |W| = n is the number of dimensions of
both spaces. Neighborhoods of meaning points
assigned to words model their semantic similar-
ity: the shorter the points’ distances in seman-
tic space, the more paradigmatically similar the
words.
The set of words W, spanning the semantic
space, is selected on the basis of the criterion of
document frequency, which proves to be of com-
parable effectiveness as information gain and
χ2-statistics (Yang and Pedersen, 1997). Fur-
thermore, instead of using explicit stop word
lists, we restricted W to the set of lemmatized
nouns, verbs, adjectives, and adverbs.
M1: In a second step, we use S as a format
for representing meaning points of texts, which
are mapped onto S with the help of a weighted
mean of the meaning points assigned to their
lexical constituents:
vectorxk =
summationdisplay
ai∈W(xk)
wikvectorai ∈ S (1)
vectorxk is the meaning point of text xk ∈ C, vectorai the
meaning point of word ai ∈ W, and W(xk) is
the set of all types of all tokens in xk. Finally,
wik is a weight having the same role as the tfidf-
scores in IR (Salton and Buckley, 1988). As
a result of mapping texts onto S, they can be
compared with respect to the paradigmatic sim-
ilarity of their lexical organization. This is done
with the help of a similarity measure σ based
on an Euclidean metric δ operating on meaning
points and standardized to the unit interval:
σ: {vectorx|x ∈ C}2 → [0,1] (2)
σ is interpreted as follows: the higher σ(vectorx,vectory)
for two texts x and y, the shorter the distance
of their meaning points vectorx and vectory in semantic
space, the more similar the paradigmatic usage
regularities of their lexical constituents, and fi-
nally the more semantically similar these texts
according to the extended contextual hypothe-
sis. This is the point, where semantic spaces
depart from the vector space model, since they
do not demand that the texts in question share
any lexical constituents in order to be similar;
the intersection of the sets of their lexical con-
stituents may even be empty.
M2: So far, only lexical features are consid-
ered. We depart a step further from the bag
of words model by additionally comparing texts
with respect to their organization. This is done
with the help of a set of quantitative text char-
acteristics used by (Tuldava, 1998) for auto-
matic genre analysis: type-token ratio, hapax
legomena, (variation of) mean word frequency,
average sentence length, and action coefficient
(i.e. the standardized ratio of verbs and adjec-
tives in a text). In order to make these fea-
tures comparable, they were standardized us-
ing z-scores so that random variables were de-
rived with means of 0 and variances of 1. Be-
yond these characteristics, a further feature was
considered: each text was mapped onto a so
called text structure string representing its divi-
sion into sections, paragraphs, and sentences as
a course approximation of its rhetorical struc-
ture. For example, a text structure string
(T(D(S))(D(S −S −S))) (3)
denotes a text T of two sections D, where the
first includes 1 and the second 3 sentences S.
Using the Levenshtein metric for string compa-
rison, this allows to measure the rhetorical simi-
larity of texts in a first approximation. The idea
is to distinguish units connoted by a text, which
in spite of having similar lexical organizations
differ texturally. If for example a short com-
mentary connotes two equally similar texts, an-
other commentary and a long report, the com-
mentary should be preferred. Thus, in M2 the
textual connotation of a text is not only seen to
be structured on the basis of the criterion of sim-
ilarity of lexical organization, but also by means
of genre specific features modeled as quantita-
tive text characteristics. This approach follows
(Herdan, 1966), who programmatically asked,
whether difference in style correlates with dif-
ference in frequency of use of linguistic forms.
See (Wolters and Kirsten, 1999) who, following
this approach, already used POS frequency as a
source for genre classification, a task which goes
beyond the scope of the given paper.
On this background a compound text similar-
ity measure can be derived as a linear model:
σ(x,y) =
3summationdisplay
i=1
ωiσi(x,y) ∈ [0,1] (4)
a. where σ1(x,y) = σ(vectorx,vectory) models lexical se-
mantics of texts x,y according to M1;
b. σ2 uses the Levenshtein metric for measur-
ing the similarity of the text structure stings
assigned to x and y;
c. and σ3 measures, based on an Euclidean me-
tric, the similarity of texts with respect to
the quantitative features enumerated above.
ωi biases the contribution of these different di-
mensions of text representation. We yield good
results for ω1 = 0.9, ω2 = ω3 = 0.05.
M3: Finally, we experimented with a text
representation model resulting from the aggre-
gation (i.e. weighted mean) of the vector repre-
sentations of a text in both spaces, i.e. vector
and semantic space. This approach, which de-
mands both spaces to have exactly the same
dimensions and standardized coordinate values,
follows the idea to reduce the noise inherent to
both models: whether syntagmatic as in case
of vector spaces, or paradigmatic as in case of
semantic spaces. We experimented with equal
weights of both input vectors.
In the next section we use the text represen-
tation models M1, M2, M3 as different starting
points for modeling the concept of structured
connotation as defined in definition (1):
3 Text Linkage
Departing from ordinary list as well as cluster
structures, we model the connotation of a text
as a hierarchy, where each node represents a sin-
gle connoted text (and not a set of texts as in
case of agglomerative cluster analysis). In or-
der to narrow down a solution for this task we
need a linguistic criterion, which bridges be-
tween the linguistic knowledge represented in
semantic spaces and the task of connotative text
linkage. For this purpose we refer to the con-
cept of lexical cohesion introduced by (Halliday
and Hasan, 1976); see (Morris and Hirst, 1991;
Hearst, 1997; Marcu, 2000) who already use this
concept for text segmentation. According to
this approach, lexical cohesion results from re-
iterating words, which are semantically related
on the basis of (un-)systematic relations (e.g.
synonymy or hyponymy). Unsystematic lexi-
cal cohesion results from patterns of contextual,
paradigmatic similarity: “[...] lexical items
having similar patterns of collocation—that is,
tending to appear in similar contexts—will gen-
erate a cohesive force if they occur in adjacent
sentences.” (Halliday and Hasan, 1976, p. 286).
Several factors influencing this cohesive force
are decisive for reconstructing the concept of
textual connotation:(i) the contextual similarity
of the words in question, (ii) their syntagmatic
order, and (iii) the distances of their occurren-
ces. These factors cooperate as follows: the
shorter the distance of similar words in a text
the higher their cohesive force. Furthermore,
preceding lexical choices restrict (the interpre-
tation of) subsequent ones, an effect, which re-
tards as their distance grows. But longer dis-
tances may be compensated by higher contex-
tual similarities so that highly related words can
contribute to the cohesion of a text span even
if they distantly co-occur. By means of restrict-
ing contextual to paradigmatic similarity and
therefore measuring unsystematic lexical cohe-
sion as a function of paradigmatic regularities,
the transfer of this concept to the task of hierar-
chically modeling textual connotations becomes
straightforward. Given a text x, whose connota-
tion is to be represented as a tree T, we demand
for any path P starting with root x:
(i) Similarity: If text y is more similar to x
than z, then the path between x and y is
shorter than between x and z, supposed
that y and z belong to the same path P.
(ii) Order: The shorter the distance between y
and z in P, the higher their cohesive force,
and vice versa: the longer the path, the
higher the probability that the subsequent
z is paradigmatically dissimilar to y.
(iii) Distance: A cohesive impact is preserved
even in case of longer paths, supposed that
the textual nodes lying in between are
paradigmatically similar to a high degree.
The reason underlying these criteria is the
need to control negative effects of intransitive
similarity relations: in case that text x is highly
similar to y, and y to z, it is not guaranteed that
(x,y,z) is a cohesive path, since similarity is not
transitive. In order to reduce this risk of incohe-
sive paths, the latter criteria demand that there
is a cohesive force even between nodes which are
not immediately linked. This demand decreases
as the path distance of nodes increases so that
topic changes latently controlled by preceding
nodes can be realized. In other words: adding
text z to the hierarchically structured connota-
tion of x, we do not simply look for an already
inserted text y, to which z is most similar, but
to a path P, which minimizes the loss of cohe-
sion in the overall tree, when z is attached to P.
These comments induce an optimality criterion
which tries to optimize cohesion not only of di-
rectly linked nodes, but of whole paths, thereby
reflecting their syntagmatic order. Looking for
a mathematical model of this optimality crite-
rion, minimal spanning trees (MST) drop out,
since they only optimize direct node-to-node
similarities disregarding any path context. Fur-
thermore, whereas we expect to yield differ-
ent trees modeling the connotations of differ-
ent texts, MSTs ignore this aspect dependency
since they focus on a unique spanning tree of
the underlying feature space. Another candi-
date is given by dependency trees (Rieger, 1984)
which are equal to similarity trees (Lin, 1998):
for a given root x, the nodes are inserted into
its similarity tree (ST) in descending order of
their similarity to x, where the predecessor of
any node z is chosen to be the node y already
inserted, to which z is most similar. Although
STs already capture the aspect dependency in-
duced by their varying roots, the path criterion
is still not met. Thus, we generalize the concept
of a ST to that of a cohesion tree as follows:
First, we observe that the construction of STs
uses two types of order relations: the first, let
it call ≤1x, determines the order of the nodes
inserted dependent on root x; the second, let
it call ≤2y, varies with node y to be inserted
and determines its predecessor. Next, in order
to build cohesion trees out of this skeleton, we
instantiate all relations ≤2y in a way, which finds
the path of minimal loss of cohesion when y is
attached to it. This is done with the help of
a distance measure which induces a descending
order of cohesion of paths:
Definition 2. Let G = 〈V,E〉 be a graph and
P = (v1,...,vk) a simple path in G. The path
sensitive distance ˆδ(P,y) of y ∈ V with respect
to P is defined as
ˆδ(P,y) = 1
max(δ)
summationdisplay
vi∈V (P)
ωiδ(vectory,vectorvi) ∈ [0,1],
wheresummationtextvi∈V (P) ωi ≤ 1, max(δ) is the maximal
value assumed by distance measure δ, and V(P)
is the set of all nodes of path P.
It is clear that for any of the text representa-
tion models M1, M2, M3 and their correspond-
ing similarity measures we get different distance
measures ˆδ which can be used to instantiate the
order relations ≤2y in order to determine the end
vertex of the path of minimal loss of cohesion
when y is attached to it. In case of increasing
biases ωi for increasing index i in definition (2)
the syntagmatic order of path P is reflected in
the sense that the shorter the distance of x to
any vertex in P, the higher the impact of their
(dis-)similarity measured by δ, the higher their
cohesive force. Using the relations ≤2y we can
now formalize the concept of a cohesion tree:
Definition 3. Let G = 〈V,E,ω〉 be a complete
weighted graph induced by a semantic space,
and x ∈ V a node. The graph D(G,x) =
〈V,E,ν〉 with E = {{v,w}|v <1x w ∧ ¬∃y ∈ V :
y <1x w∧y <2w v} and ν: E → a82, the restriction
of ω to E, is called cohesion tree induced by x.
Using this definition of a cohesion tree (CT)
we can compute hierarchical models of the con-
notations of texts, in which not only aspect de-
pendency induced by the corresponding root,
but also path cohesion is taken into account.
A note on the relation between CTs and clus-
ter analysis: CTs do not only depart from clus-
ter hierarchies, since their nodes represent sin-
gle objects, and not sets, but also because they
refer to a local, contextsensitive building crite-
rion (with respect to their roots and paths). In
contrast to this, cluster analysis tries to find a
global partition of the data set. Nevertheless
there is a connection between both methods of
unsupervised learning: Given a MST, there is
a simple procedure to yield a divisive partition
(Duda et al., 2001). Moreover, single linkage
graphs are based on a comparable criterion as
MSTs. Analogously, a given CT can be divided
into non-overlapping clusters by deleting those
edges whose length is above a certain threshold.
This induces, so to say, perspective clusters or-
ganized dependent on the perspective of the root
and paths of the underlying CT.
4 Evaluation
Figure (1) exemplifies a CT based on M3 using a
textual root dealing with the “BSE Food Scan-
dal” from 1996. The text sample belongs to a
corpus of 502 texts of the German newspaper
S¨uddeutsche Zeitung of about 320,000 run-
ning words. Each text belongs to an element of
a set T of 18 different subject categories (e.g.
politics, sports). Based on the lemmatized cor-
pus a semantic space of 2715 lexical dimensions
was built and all texts were mapped onto this
space according to the specifications of M3. In
figure (1) each textual node of the CT is rep-
resented by its headline and subject category
as found in the newspaper. All computations
Figure 1: A sample CT.
were performed using a set of C++ programs es-
pecially implemented for this study.
In order to rate models M1, M2, M3 in com-
parison to the vector space model (VS) using
MSTs, STs and CTs as alternative hierarchi-
cal models we proceed as follows: as a simple
measure of representational goodness we com-
pute the average categorial cohesion of links of
all MSTs, STs and CTs for the different models
and all texts in the corpus. Let G = 〈V,E〉 be
a tree of textual nodes x ∈ V, each of which is
assigned to a subject category τ(x) ∈ T , and
P(G) the set of all paths in G starting with
root x and ending with a leaf, then the cate-
gorial cohesion of G is the average number of
links (vi,vj) ∈ E per path P ∈ P(G), where
τ(vi) = τ(vj). The more nodes of identical cat-
egories are linked in paths in G, the more cat-
egorially homogeneous these paths, the higher
the average categorial cohesion of G. According
to the conceptual basis of CTs we expect these
trees to be of highest categorial link cohesion,
but this is not true: MSTs produce the highest
cohesion values in case of VS and M3. Further-
more, we observe that model M3 induces trees
of highest cohesion and lowest variance, whereas
VS shows the highest variance and lowest cohe-
sion scores in case of STs and CTs. In other
words: based on semantic spaces, models M1,
M2, and M3 produce more stable results than
the vector space model.
Using M3 as a starting point it can be
asked more precisely, which tree class produces
the most cohesive model of text connotation.
Clearly, the measure of categorial link cohesion
is not sufficient to evaluate the classes, since two
immediately linked texts belonging to the same
Model MSTs STs CTs
VS 1325.88 462.04 598.87
M1 1093.06 680.06 1185.92
M2 1097.39 661.72 1168.63
M3 1488.38 628.51 1032.55
Table 1: Alternative representation models and
scores of trees derived from them.
subject category may nevertheless deal with dif-
ferent topics. Thus we need a finer-grained
measure which operates directly on the texts’
meaning representations. In case of unsuper-
vised clustering, where fine-grained class labels
are missed, (Steinbach et al., 2000) propose a
measure which estimates the overall cohesion of
a cluster. This measure can be directly applied
to trees: let Pv1,vn = (v1,...,vn) be a path in
tree G = 〈V,E〉 starting with root v1 = x, we
compute the cohesion of P irrespective of the
order of its nodes as follows:
ξ(Pv1,vn) = 1− 1n2
nsummationdisplay
i,j=1
1
max(δ)δ(vi,vj) (5)
The more similar the nodes of path P accor-
ding to metric δ, the more cohesive P. δ is
derived from the distance measure operating
on the semantic space to which texts vi are
mapped. As before, all scores ξ(P) are summed
up for all paths in P(G) and standardized by
means of |P(G)|. This guarantees that neither
trees of maximum height (MHT) nor of max-
imum degree (MDT), i.e. trees which trivially
correspond to lists, are assigned highest cohe-
sion values. The results of summing up these
scores for all trees of a given class for all texts
in the test corpus are shown in table (2). Now,
Type summationtextξ(G) Type summationtextξ(G)
MDT 388.1 MST 416.3
MHT 388.1 DT 430.9
RST 386.6 CT 438.6
Table 2: The sum of the cohesion scores for all
tree classes and all texts in the test corpus.
CTs and STs realize the most cohesive struc-
tures. This is more obvious if the scores ξ(G)
are compared for each text in separation: in
494 cases, CTs are of highest cohesion accord-
ing to measure (5). In only 7 cases, MST are
of highest cohesion, and in only one case, the
corresponding ST is of highest cohesion. More-
over, even the stochastically organized so called
random successor trees (RST), in which succes-
sor node’s and their predecessors are randomly
chosen, produce more cohesive structures than
lists (i.e. MDTs and MHTs), which form the
predominant format used to organize search re-
sults in Internet.
To sum up: Table (2) rates CTs in combi-
nation with model M3 on highest level. Thus,
from the point of view of lexical semantics CTs
realize more cohesive branches than MSTs. But
whether these differences are significant, is hard
to evaluate, since their theoretical distribution
is unknown. Thus, future work will be on find-
ing these distributions.
5 Conclusion
This paper proposed 3 numerical representation
formats as means for modeling the hierarchical
connotation of texts in combination with cohe-
sion trees. This was done by extending the weak
contextual hypothesis onto the level of texts in
combination with a reinterpretation of the con-
cept of lexical cohesion as a source for text link-
age. Although the formats used depart from
the bag of words model there is still the need of
investigating numerical formats which rely on
linguistically more profound discourse models.

References

R. O. Duda, P. E. Hart, and D. G. Stork. 2001.
Pattern Classification. Wiley, New York.

Michael A. K. Halliday and R. Hasan. 1976.
Cohesion in English. Longman, London.

M. A. Hearst and J. O. Pedersen. 1996. Reex-
amining the cluster hypothesis: Scatter/gath-
er on retrieval results. In Proc. ACM SIGIR.

M. A. Hearst. 1997. Texttiling: Segmenting
text into multi-paragraph subtopic passages.
Computational Linguistics, 23(1):33–64.

G. Herdan. 1966. The Advanced Theory of
Language as Choice and Chance. Springer,
Berlin.

W. Kintsch. 1998. Comprehension. A Paradigm
for Cognition. Cambridge University Press.

T. K. Landauer and S. T. Dumais. 1997. A solu-
tion to plato’s problem. Psychological Review,
104(2):211–240.

D. Lin. 1998. Automatic retrieval and cluster-
ing of similar words. In Proc. COLING-ACL.

D. Marcu. 2000. The Theory and Practice of
Discourse Parsing and Summarization. MIT
Press, Cambridge, Massachusetts.

G. A. Miller and W. G. Charles. 1991. Con-
textual correlates of semantic similarity. Lan-
guage and Cognitive Processes, 6(1):1–28.

J. Morris and G. Hirst. 1991. Lexical cohesion
computed by thesaural relations as an indi-
cator of the structure of text. Computational
Linguistics, 17(1):21–48.

B. Rieger. 1984. Semantic relevance and as-
pect dependency in a given subject domain.
In Proc. 10th COLING.

E. Riloff. 1995. Little words can make a big
difference for text classification. In Proc.
SIGIR-95.

G. Salton and C. Buckley. 1988. Term
weighting approaches in automatic text re-
trieval. Information Processing Management,
24(5):513–523.

H. Sch¨utze. 1998. Automatic word sense
discrimination. Computational Linguistics,
24(1):97–123.

S. Scott and S. Matwin. 1999. Feature engi-
neering for text classification. In Proc. 16th
ICML, pages 379–388.

M. Steinbach, G. Karypis, and V. Kumar. 2000.
A comparison of document clustering tech-
niques. In KDD Workshop on Text Mining.

J. Tuldava. 1998. Probleme und Methoden der
quantitativ-systemischen Lexikologie. Wis-
senschaftlicher Verlag, Trier.

M. Wolters and M. Kirsten. 1999. Exploring
the use of linguistic features in domain and
genre classication. In Proc. EACL.

Y. Yang and J. O. Pedersen. 1997. A compar-
ative study on feature selection in text cate-
gorization. In Proc. 14th ICML.
