Recognition of synonyms by a lexical graph
Peter Siniakov
siniakov@inf.fu-berlin.de
Database and Information Systems Group, Freie Universit¨at Berlin
Takustr. 9, 14195 Berlin, Germany
Abstract
Semantic relationships between words
comprised by thesauri are essential fea-
tures for IR, text mining and informa-
tion extraction systems. This paper in-
troduces a new approach to identifica-
tion of semantic relations such as syn-
onymy by a lexical graph. The graph
is generated from a text corpus by em-
bedding syntactically parsed sentences
in the graph structure. The vertices
of the graph are lexical items (words),
their connection follows the syntactic
structure of a sentence. The struc-
ture of the graph and distances be-
tween vertices can be utilized to define
metrics for identification of semantic
relations. The approach has been eval-
uated on a test set of 200 German syn-
onym sets. Influence of size of the text
corpus, word generality and frequency
has been investigated. Conducted ex-
periments for synonyms demonstrate
that the presented methods can be ex-
tended to other semantic relations.
1 Introduction
Once predominantly used by human authors
to improve their style avoiding repetitions of
words or phrases, thesauri now serve as an im-
portant source of semantic and lexical infor-
mation for automatic text processing. The
electronic online thesauri such as WordNet
(2005) and OpenThesaurus (2005) have been
increasingly employed for many IR and NLP
problems. However, considerable human ef-
fort is required to keep up with the evolving
language and many subdomains are not suffi-
ciently covered (Turney, 2001). Many domain-
specific words or word senses are not included;
inconsistency and bias are often cited as fur-
ther major deficiencies of hand-made thesauri
(Curran and Moens, 2002), (Senellart and
Blondel, 2003). There is a continuous demand
for automatic identification of semantic rela-
tions and thesaurus generation. Such tools
do not only produce thesauri that are more
adapted to a particular application in a cer-
tain domain, but provide also assistance for
lexicographers in manual creation and keeping
the hand-written thesauri up to date. Numer-
ous applications in IR (e.g. query expansion)
and text mining (identification of relevant con-
tent by patterns) underline their usefulness.
2 Related work
Identification of semantic relations has been
approached by different communities as a com-
ponent of a knowledge management system or
application of a developed NLP framework.
Many approaches are guided by the assump-
tion that similar terms occur in similar context
and obtain a context representation of terms
as attribute vectors or relation tuples (Cur-
ran and Moens, 2002), (Ruge, 1997), (Lin,
1998). A similarity metric defined on the con-
text representations is used to cluster similar
terms (e.g. by the nearest neighbor method).
The actual definitions of context (whole doc-
ument (Chen and Lynch, 1992), textual win-
dow, some customized syntactic contexts, cf.
(Senellart and Blondel, 2003)) and similar-
ity metric (cf. (Manning and Sch¨utze, 1999),
(Curran and Moens, 2002)) are the essential
distinguishing features of the approaches.
A pattern-based method is proposed by
Hearst (Hearst, 1998). Existing relations in
the WordNet database are used to discover
regular linguistic patterns that are character-
istic for these relations. The patterns contain
lexical and syntactic elements and are acquired
32
from a text corpus by identifying common con-
text of word pairs for which a semantic rela-
tion holds. Identified patterns are applied to a
large text corpus to detect new relations. The
method can be enhanced by applying filtering
steps and iterating over new found instances
(Phillips and Riloff, 2002).
Lafourcade and Prince base their approach
on reduction of word semantics to conceptual
vectors (vector space is spanned by a hierarchy
of concepts provided by a thesaurus, (Lafour-
cade, 2001)). Every term is projected in the
vector space and can be expressed by the linear
combination of conceptual vectors. The angle
between the vectorial representations of two
terms is used in calculation of thematic close-
ness (Lafourcade and Prince, 2001). The ap-
proach is more closely related to our approach
since it offers a quantitative metric to measure
the degree of synonymy between two lexical
items.
In contrast, Turney (Turney, 2001) tries to
solve a quite simpler “TOEFL-like” task of se-
lecting a synonym to a given word from a set
of words. Mutual information related to the
co-occurrence of two words combined with in-
formation retrieval is used to assess the degree
of their statistical independency. The least in-
dependent word is regarded synonymous.
Blondell et al. (Blondel et al., 2004) encode
a monolingual dictionary as a graph and iden-
tify synonyms by finding subgraphs that are
similar to the subgraph corresponding to the
queried term.
The common evaluation method for similar-
ity metrics is comparing their performance on
the same test set with the same context repre-
sentations with some manually created seman-
tic source as the gold standard (Curran and
Moens, 2002). Abstracting from results for
concrete test sets, Weeds et al. (2004) try to
identify statistical and linguistic properties on
that the performance of similarity metrics gen-
erally depends. Different bias towards words
with high or low frequency is recognized as one
reason for the significant variance of k-nearest
neighbors sets of different similarity metrics.
3 Construction of the lexical graph
The assumption that similar terms occur in
similar context leads to the establishing of ex-
plicit context models (e.g. in form of vectors or
relation tuples) by most researchers. We build
an implicit context representation connecting
lexicalitemsinawaycorrespondingtothesen-
tence structure (as opposed to (Blondel et al.,
2004)), where a term is linked to every word
in its definition). The advantage of the graph
model is its transitivity: not only terms in the
immediate context but also semantically re-
lated terms that have a short path to the ex-
amined term (but perhaps have never occurred
in its immediate context) can contribute to
identification of related terms. The similarity
metric can be intuitively derived from the dis-
tance between the lexical vertices in the graph.
Figure 1: Main steps during graph construc-
tion
To construct the lexical graph articles from
five volumes of two German computer jour-
nals have been chunk-parsed and POS tagged
using TreeTagger (2004). To preserve the se-
mantic structure of the sentences during the
graph construction, i.e. to connect words that
build the actual statement of the sentence,
parsed sentences are preprocessed before be-
ing inserted in the graph (fig. 1). The punc-
tuation signs and parts of speech that do not
carry a self-contained semantics (such as con-
junctions, pronouns, articles) are removed in
a POS filtering step. Tokenization errors are
heuristically removed and the words are re-
placed by their normal forms (e.g. infinitive
form for verbs, nominative singular for nouns).
German grammar is characterized by a very
frequent use of auxiliary and modal verbs that
in most cases immediately precede or follow
the semantically related sentence parts such
as direct object or prepositional phrase while
the main verb is often not adjacent to the
related parts in a sentence. Since the direct
edge between the main verb and non-adjacent
related sentence parts cannot be drawn, the
33
sentence is syntactically reorganized by replac-
ing the modal or auxiliary verbs by the corre-
sponding main verb. Another syntactic rear-
rangement takes place when detachable pre-
fixes are attached to the corresponding main
verb. In German some prefixes of verbs are
detached and located at the end of the main
clause. Since verbs without a prefix have a
different meaning prefixes have to be attached
to the verb stem. The reorganized sentence
Figure 2: An example of a sentence trans-
formed in a lexical graph
can be added to the graph inserting the nor-
malized words in a sentence as vertices and
connecting the adjacent words by a directed
edge. However, some adjacent words are not
semantically related to each other, therefore
the lexical graph features two types of edges
(see an example in fig. 2). A property edge
links the head word of a syntactic chunk (verb
or noun phrase) with its modifiers (adverbs or
adjectives respectively) that characterize the
head word and is bidirectional. A sequential
edge connects the head words (e.g. main verbs,
head nouns) of syntactic chunks reflecting the
“semantic backbone” of the sentence.
The length of an edge represents how strong
two lexical items are related to each other and
depends therefore on the frequency of their
co-occurrence. It is initialized with a maxi-
mum length M. Every time an existing edge
is found in the currently processed sentence,
its current length CurLen is modified accord-
ing to CurLen = MM
CurLen+1
; hence the length
of an edge is inversely proportional to the fre-
quency of co-occurrence of its endpoints.
After all sentences from the text corpus
have been added to the lexical graph, vertices
(words) with a low frequency (≤ θ) are re-
moved from the graph to primarily acceler-
ate the distance calculation. Such rarely oc-
curring words are usually proper nouns, ab-
breviations, typos etc. Because of the low
frequency semantic relations for these words
cannot be confidently identified. Therefore
removing such vertices reduces the size of
the graph significantly without performance
penalty (the graph generated from 5 journal
volumes contained ca. 300000 vertices and
52191 after frequency filtering with θ = 8).
Experimental results feature even a slightly
better performance on filtered graphs. To pre-
serve semantic consistency of the graph and
compensate removal of existing paths the con-
nections between the predecessors and succes-
sors of removed vertices have to be taken into
account: the edge length e(p,s) between the
predecessor p to the successor s of the re-
moved vertex r can incorporate the length of
the path length(p,r,s) from p to s through
r by calculating the halved harmonic mean:
e(p,s) = e(p,s)∗lprse(p,s)+lprs. e(p,s) is the more re-
duced the smaller length(p,r,s) is and if they
are equal, e(p,s) is half as long after merging.
Beside direct edges an important indication
of semantic closeness is the distance, i.e. the
length of the shortest path between two ver-
tices. Distances are calculated by the Dijk-
stra algorithm with an upper threshold Θ.
Once the distances from a certain vertex reach
the threshold, the calculation for this vertex
is aborted and the not calculated distances
are considered infinite. Using the threshold
reduces the runtime and space considerably
while the semantic relation between the ver-
tices with distances > Θ is negligible.
The values of M, θ and Θ depend on
the particular text corpus and are chosen
to keep the size of the graph feasible. θ
can be determined experimentally increment-
ing it as long as the results on the test set
are improving. The resulting graph gener-
ated from five computer journals volumes with
M = 220, θ = 8, Θ = 60000 con-
tained 52191 vertices, 4,927,365 edges and
376,000,000 distances.
4 Identification of synonyms
The lexical graph is conceived as an instru-
ment to identify semantic relations such as
synonymy and hypernymy between lexical
items represented by its vertices. The main
34
focus of our research was finding synonyms al-
beit some results can be immediately trans-
ferred for identification of hyponyms. To pro-
vide a quantitative measure of synonymy dif-
ferent similarity metrics were defined on the
lexical graph. Given a word, the system uses
the metric to calculate the closest vertices to
the vertex that represents this word. The re-
sult is a ranked list of words sorted by the de-
gree of synonymy in descending order. Every
metric sim is normalized to be a probability
measure so that given a vertex vi the value
sim(vi,vj) can be interpreted as the probabil-
ity of vj being synonym to vi. The normaliza-
tion is performed for each metric sim by the
following functions:
nmin(sim(vi,vj)) = min(sim(vi,v1),...,sim(vi,vn))sim(vi,vj)
for metrics that indicate maximum similarity
to a vertex vi by a minimum value and
nmax(sim(vi,vj)) = sim(vi,vj)max(sim(vi,v1),...,sim(vi,vn))
for metrics that indicate maximum similarity
to a vertex vi by a maximum value, where
v1 ...vn are the set of graph vertices. In both
cases the top-ranked word has the maximum
likelihood of 1 to be a synonym of vi. The
normalized ranked lists are used for the com-
parison of different metrics and the evaluation
of the approach (see sec. 5).
A similarity metric is supposed to assess the
semantic similarity between two vertices of the
lexical graph. Since the distance metric Dis-
tanceM used for calculation of distances be-
tween the vertices in the graph indicates how
semantically related two vertices are, it can be
used as a similarity metric. As the graph is di-
rected, the distance metric is asymmetric, i.e.
the distance from vj to vi does not have to
be equal to the distance from vi to vj. The
major drawback of the DistanceM is that it
takes into account only one path between the
examined vertices. Even though the shortest
path indicates a strong semantic relation be-
tween the vertices, it is not sufficient to con-
clude synonymy that presupposes similar word
senses.
Therefore more evidence for strong seman-
tic relation with the particular aspect of sim-
ilar word senses should be incorporated in
the similarity metric. The property neigh-
bors of a vertex vi (adjacent vertices connected
with vi by the property edge) play significant
role in characterizing similar senses. If two
terms share many characteristic properties,
there is a strong evidence of their synonymy.
A shared property can be regarded as a witness
of the similarity of two word senses. There
are other potential witnesses, e.g. transitive
verbs shared by their direct objects; however,
we restricted this investigation to the property
neighbors as the most reliable witnesses.
The simple method to incorporate the con-
cept of the witnesses into the metric is to
determine the number of common property
neighbors:
NaivePropM(vi,vj) = |prop(vi)∩prop(vj)|
whereprop(vi) = {vk|e(i,k)isaproperty edge}
This method disregards, however, the different
degree of correlation between the vertices and
their property neighbors that is reflected by
the length of property edges. A property is the
more significant, the stronger the correlation
between the property and the vertex is, that
is the shorter the property edge is. The degree
of synonymy of two terms depends therefore
on the number of common properties and
the lengths of paths between these terms
leading through the properties. Analogously
to the electric circuit one can see the single
paths through different shared properties as
channels in a parallel connection and path
lengths as ”synonymy resistances”. Since a
bigger number of channels and smaller single
resistances contribute to the decreasing of the
total resistance (i.e. the evidence of synonymy
increases), the idea of WeiPropM metric is to
determine the similarity value analogously to
the total resistance in a parallel connection:
WeiPropMprime(vi,vj) =
parenleftBigg nsummationdisplay
k=1
1
length(vi,pk,vj)
parenrightBigg−1
where length(vi,pk,vj) = e(vi,pk) + e(pk,vj)
is the length of the path from vi to vj through
pk and pk ∈ prop(vi)∩prop(vj).
Another useful observation is that some
properties are more valuable witnesses than
the others. There are very general properties
that are shared by many different terms and
35
some properties that are characteristic only
for certain word senses. Thus the number of
property neighbors of a property can be re-
garded as a measure of its quality (in the sense
of characterizing the specific word meaning).
WeiPropM integrates the quality of a prop-
erty by weighting the paths leading through it
by the number of its property neighbors:
WeiPropM(vi,vj) =
parenleftBigg nsummationdisplay
k=1
1
(e(vi,pk)+e(pk,vj))∗|prop(pk)|
parenrightBigg−1
where pk ∈ prop(vi)∩prop(vj).
WeiPropM measures the correlation be-
tween two terms based on the path lengths.
Frequently occurring words tend to be ranked
higher because the property edge lengths indi-
rectly depend on the absolute word frequency.
Because of high absolute frequency of words
the frequency of their co-occurrence with dif-
ferent properties is generally also higher and
the property edges are shorter. Therefore to
compensate this deficiency (i.e. to eliminate
the bias discussed in (Weeds et al., 2004))
an edge length from a property to a ranked
term e(pk,vj) is weighted by the square root
of its absolute frequency
radicalBig
freq(vj). Using
the weighted edge length between the property
and the ranked term we cannot any longer cal-
culate the path length between vi and vj as the
sum length(vi,pk,vj) = e(vi,pk) + e(pk,vj) ∗radicalBig
freq(vj) because the multiplied second com-
ponent significantly outweighs the first sum-
mand. Relative path length can be used in-
stead where both components are adequately
taken into account and added relatively
to the minimum of the respective compo-
nent: let min1 be min(e(vi,pa),...,e(vi,pn))
where pk ∈ prop(vi) and min2 =
min(...,e(pk,vj) ∗
radicalBig
freq(vj), ...) where
pk ∈ prop(vi)∩prop(vj). Relative path length
would be e(vi,pk)min1 + e(pk,vj)∗
√
freq(vj)
min2 . Furtherexperimental observation suggests that when
searching for synonyms of vi the connection
between vi and the property is more signifi-
cant than the second component of the path –
the connection between the property and the
ranked term vj. Therefore when calculating
the relative path length the first component
has to be weighted stronger (the examined ra-
tio was 2:1). The corresponding metric can be
defined as follows:
FirstCompM(vi,vj) =parenleftbigg
summationtextn
k=1 1RelPathLength(k)∗√|prop(pk)|
parenrightbigg−1
where RelPathLength(x) =
2
3 ∗
e(vi,px)
min1 +
1
3 ∗
e(px,vj)∗
radicalBig
freq(vj)
min2
As opposed to NaivePropM and WeiPropM
FirstCompM is not symmetric because of the
emphasis on the first component.
5 Experiments
For evaluation purposes a test corpus of
200 synonym sets was prepared consulting
(OpenThesaurus, 2005). The corpus con-
sists of 75 everyday words (e.g. “Pr¨asident”
(president), “Eingang” (entrance) “Gruppe”
(group)), 60 abstract terms (e.g. “Ursache”
(reason), “Element”, “Merkmal” (feature))
and 65 domain-specific words (e.g. “Software”,
“Prozessor” (CPU)). The evaluation strat-
egy is similar to that pursued in(Curran and
Moens, 2002). The similarity metrics do not
distinguish between different word senses re-
turning synonyms of all senses of the polyse-
mous words in a single ranked list. Therefore
the synonym set of a word in the test cor-
pus is the union of synonym sets of its senses.
To provide a measure for overall performance
and to compare the different metrics a func-
tion measuring the similarity score (SimS) was
defined that assigns a score to a metric for
correctly found synonyms among the 25 top-
ranked. The function assigns 25 points to
the correctly found top-ranked synonym of vi
(SimS(0,vi) = 25) and 1 point to the syn-
onym with the 25th rank (SimS(25,vi) = 1).
The rank of a synonym is decreased only by
false positives that are ranked higher (i.e. each
of correctly identified top n synonyms has rank
0). In order to reward the top-ranked syn-
onyms stronger the scoring function features a
hyperbolic descent. For a synonym of vi with
the rank x:
SimS(x,vi) =


0, if x /∈ synset(vi)
24∗√26
(√26−1)∗√x+1 +1−
24√
26−1



36
To compare performance of different
metrics the SimS values of the top 25
words in the ranked list were summed
for each word of a test corpus. The to-
tal score of a similarity metric Sim issummationtext
200
i=1
summationtext25
j=1 SimS(rank(RankedList(vi,j)),vi)
where RankedList(vi,j) returns the word at
the position j from the ranked list produced
by Sim for vi and v1,...,v200 are the words
of the test corpus.
Besides, a combined precision and recall
measure Π was used to evaluate the ranked
lists. Given the word vi, we examined the first
n words (n = 1,5,25,100) of the ranked list
returned by a similarity metric for vi whether
they belong to the synset(vi) of the test cor-
pus. Π(n) will measure precision if n is less
than the size of the synset(vi) because the
maximum recall can not be reached for such
n and recall otherwise because maximum pre-
cision cannot be reached for n > |synset(vi)|.
The Π values were averaged over 200 words.
Table 1 presents the result of evaluating the
similarity metrics introduced in sec. 4. The
results of DistanceM confirm that regarding
distance between two vertices alone is not
sufficient to conclude their synonymy. Dis-
tanceM finds many related terms ranking gen-
eral words with many outgoing and incoming
edges higher, but it lacks the features pro-
viding the particular evidence of synonymy.
NaivePropM is clearly outperformed by the
both weighted metrics. The improvement rel-
ative to the DistanceM and acceptable pre-
cision of the top-ranked synonyms Π(1) show
that considering shared properties is an ad-
equate approach to recognition of synonyms.
Ignoring the strength of semantic relation in-
dicated by the graph and the quality of prop-
erties is the reason for the big gap in the
total score and recall value (Π(100)). Both
weighted metrics achieved results comparable
with those reported by Curran and Moens
in (Curran and Moens, 2002) and Turney in
(Turney, 2001). Best results of FirstCompM
confirm that the criteria identified in sec. 4
such as generality of a property, abstraction
from the absolute word frequency etc. are rel-
evant for identification of synonyms. First-
CompM performed particularly better in find-
ing synonyms with the low frequency of occur-
rence.
In another set of experiments we investi-
gated the influence of the size of the text cor-
pus (cf. fig. 3). The plausible assumption
is the more texts are processed, the better
the semantic connections between terms are
reflected by the graph, the more promising re-
sults are expected. The fact that the num-
ber of vertices does not grow proportionally
to the size of text corpus can be explained by
word recurrence and growing filtering thresh-
old θ. However, the number of edges increases
linearly and reflects the improving semantic
coverage. As expected, every metric performs
considerably better on bigger graphs. While
NaivePropM seems to converge after three
volumes, the both weighted metrics behave
strictly monotonically increasing. Hence an
improvement of results can be expected on big-
ger corpora. On the small text corpora the re-
sults of single metrics do not differ significantly
since there is not sufficient semantic informa-
tion captured by the graph, i.e. the edge and
path lengths do not fully reflect the seman-
tic relations between the words. The scores
of both weighted metrics grow, though, much
faster than that of NaivePropM. FirstCompM
achieves the highest gradient demonstrating
the biggest potential of leveraging the grow-
ing graph for finding synonymy.
Figure 3: Influence of the size of the text cor-
pus.
To examine the influence of the word cat-
egories results on the subsets of the text cor-
pus corresponding to a category are compared.
All metrics show similar behavior, therefore
we restrict the analysis to the Π values of
37
Metric Score Π(1) Π(5) Π(25) Π(100)
DistanceM 2990.7 0.20 0.208 0.199 0.38
NaivePropM 6546.3 0.415 0.252 0.271 0.440
WeiPropM 9411.7 0.54 0.351 0.398 0.607
FirstCompM 11848 0.575 0.412 0.472 0.637
Table 1: Results of different metrics on the test corpus
FirstCompM (fig. 4). Synonyms of domain-
specific words are recognized better than those
of abstract and everyday words. Their se-
mantics are better reflected by the technically
oriented texts. The Π values for abstract
and everyday words are pretty similar except
for the high precision of top-ranked abstract
synonyms. Everyday words suffer from the
fact that their properties are often too gen-
eral to uniquely characterize them, which in-
volves loss of precision. Abstract words can
be extremely polysemous and have many sub-
tle aspects that are not sufficiently covered by
the texts of computer journals.
Figure 4: Dependency of Π(n) on word cate-
gory (results of FirstCompM metric)
To test whether the metrics perform bet-
ter for the more frequent words the test set
was divided in 9 disjunctive frequency clus-
ters (table 2). FirstCompM achieved consid-
erably better results for very frequently occur-
ring words (≥ 4000 occurrences). This con-
firms indirectly the better results on the big-
ger text corpora: while low frequency does not
exclude random influence, frequent occurrence
involves adequate capturing of the word se-
mantics in the graph by inserting and adjust-
ing all relevant property edges. These results
do not contradict the conclusion that First-
CompM is not biased towards words with a
certain frequency because the mentioned bias
pertains to retrieval of synonyms with a cer-
tain frequency, whereas in this experiment the
performance for different word frequencies of
queried words is compared.
6 Conclusion
We have introduced the lexical graph as an
instrument for finding semantic relations be-
tween lexical items in natural language cor-
pora. The big advantage of the graph in com-
parison to other context models is that it cap-
tures not only the immediate context but es-
tablishes many transitive connections between
related terms. We have verified its effective-
ness searching for synonymy. Different met-
rics have been defined based on shortest path
lengths and shared properties. Similarity met-
ric FirstCompM that best leverages the graph
structure achieved the best results confirming
the significant role of number of shared proper-
ties, frequency of their co-occurrence and the
degree of their generality for detecting of syn-
onymy. Significantly improving results for big-
ger text corpora andmorefrequently occurring
words are encouraging and promising for de-
tection of other semantic relations. New meth-
ods that increasingly employ the graph struc-
ture e.g. regarding the lengths and number
of short paths between two terms or extend-
ing the witness concept to other morphological
types are the subject of further research.
Acknowledgements
I would like to thank Heiko Kahmann for
the valuable assistance in implementation and
evaluation of the approach. This research is
supported by NaF¨oG scholarship of the fed-
eral state Berlin.

References
Vincent D. Blondel, Anah Gajardo, Maureen Hey-
mans, Pierre Senellart, and Paul Van Dooren.
2004. A measure of similarity between graph
vertices. With applications to synonym extrac-
tion and web searching. In SIAM Review, pages
647–666.
Hsinchun Chen and Kevin J. Lynch. 1992. Auto-
matic construction of networks of concepts char-
acterizing document databases. In IEEE Trans-
actions on Systems, Man and Cybernetics, vol-
ume 22(5), pages 885–902.
James R. Curran and Marc Moens. 2002. Improve-
ments in automatic thesaurus extraction. In
Proceedings of the Workshop of the ACL Special
Interest Group on the Lexicon (SIGLEX), pages
59–66. Association for Computational Linguis-
tics.
M. A. Hearst. 1998. Automated discovery of
Wordnet relations. In C. Fellbaum, editor,
Wordnet An Electronic Lexical Database, pages
131–151. MIT Press, Cambridge, MA.
Mathieu Lafourcade and Violaine Prince. 2001.
Relative synonymy and conceptual vectors. In
Proceedings of the NLPRS, Tokyo, Japan.
Mathieu Lafourcade. 2001. Lexical sorting and
lexical transfer by conceptual vectors. In Pro-
ceedings of the First International Workshop on
MultiMedia Annotation, Tokyo, Japan.
Dekang Lin. 1998. An information-theoretic
definition of similarity. In Proceedings of the
Fifteenth International Conference on Machine
Learning, pages 296–304, Madison, WI.
Christopher D. Manning and Hinrich Sch¨utze.
1999. Foundations of Statistical Natural Lan-
guage Processing. MIT Press, Cambridge, MA
2000.
OpenThesaurus. 2005. OpenThesaurus
- Deutscher Thesaurus. http://www.
openthesaurus.de.
William Phillips and Ellen Riloff. 2002. Exploit-
ing strong syntactic heuristics and co-training
to learn semantic lexicons. In Proceedings of the
2002 Conference on Empirical Methods in NLP.
Gerda Ruge. 1997. Automatic detection of the-
saurus relations for information retrieval appli-
cations. Foundations of Computer Science: Po-
tential - Theory - Cognition, LNCS 1337:499–
506.
Pierre P. Senellart and Vincent D. Blondel. 2003.
Automatic discovery of similar words. In
Michael Berry, editor, Survey of Text Mining.
Clustering, classification, and retrieval, pages
25–44. Springer Verlag, Berlin.
TreeTagger. 2004. http://www.ims.
uni-stuttgart.de/projekte/corplex/
TreeTagger/.
Peter D. Turney. 2001. Mining the Web for syn-
onyms: PMI–IR versus LSA on TOEFL. Lec-
ture Notes in Computer Science, 2167:491–502.
Julie Weeds, David Weir, and Diana McCarthy.
2004. Characterising measures of lexical distri-
butional similarity.
WordNet. 2005. http://wordnet.princeton.
edu/w3wn.html.
