Redundancy: helping semantic disambiguation 
Caroline Barri~re 
School of Information Technology and Engineering 
University of Ottawa 
Ottawa, Canada, KIN 7Z3 
barriere@site.uottawa.ca 
Abstract 
Redundancy is a good thing, at least in a learn- 
ing process. To be a good teacher you must 
say what you are going to say, say it, then say 
what you have just said. Well, three times is 
better than one. To acquire and learn knowl- 
edge from text for building a lexical knowledge 
base, we need to find a source of information 
that states facts, and repeats them a few times 
using slightly different sentence structures. A 
technique is needed for gathering information 
from that source and identify the redundant in- 
formation. The extraction of the commonality 
is an active learning of the knowledge expressed. 
The proposed research is based on a clustering 
method developed by Barri~re and Popowich 
(1996) which performs a gathering of related 
information about a particular topic. Individ- 
ual pieces of information are represented via the 
Conceptual Graph (CG) formalism and the re- 
sult of the clustering is a large CG embedding 
all individual graphs. In the present paper, we 
suggest that the identification of the redundant 
information within the resulting graph is very 
useful for disambiguation of the original infor- 
mation at the semantic level. 
1 Introduction 
The construction of a Lexical Knowledge Base 
(LKB), if performed automatically (or semi- 
automatically), attempts at extracting knowl- 
edge from text. The extraction can be viewed 
as a learning process. Simplicity, clarity and re- 
dundancy of the information given in the source 
text are key features for a successful acquisition 
of knowledge. We assume success is attained 
when a sentence from the source text expressed 
in natural language can be transformed into an 
unambiguous internal representation. Using a 
conceptual graph (CG) representation (Sowa, 
1984) of sentences means that a successful ac- 
quisition of knowledge corresponds to trans- 
forming each sentence from the source text into 
a set of unambiguous concepts (correct word 
senses found) and unambiguous relations (cor- 
rect semantic relations between concepts). 
This paper will look at the idea of making 
good use of the redundancy found in a text to 
help the knowledge acquisition task. Things are 
not always understood when they are first en- 
countered. A sentence expressing new knowl- 
edge might be ambiguous (at the level of the 
concepts it introduces and/or at the level of the 
semantic relations between those concepts). A 
search through previously acquired knowledge 
might help disambiguate the new sentence or it 
might not. A repetition of the exact same sen- 
tence would be of no help, but a slightly differ- 
ent format of expression might reveal necessary 
aspects for the comprehension. This is the av- 
enue explored in this paper which will unfold 
as follows. Section 2 will present briefly a pos- 
sible good source of knowledge and a gather- 
ing/clustering technique. Section 3 will present 
how the redundancy resulting from the cluster- 
ing process can be used in solving some types 
of semantic ambiguity. Section 4 will emphasize 
the importance of semantic relations for the pro- 
cess of semantic disambiguation. Section 5 will 
conclude. 
2 Source of information and 
clustering technique 
To acquire and learn knowledge from text for 
building a lexical knowledge base, we need to 
find a source of information that states facts, 
and repeats them a few times using slightly 
different sentence structures. A technique is 
needed for gathering information from that 
source and identify the redundant information. 
103 
These two aspects are discussed hereafter: (1) 
the choice of a source of information and (2) the 
information gathering technique. 
2.1 Choice of source of information 
When we think of learning about words, we 
think of textbooks and dictionaries. Redun- 
dancy might be present but not always simplic- 
ity. Any text is written at a level which assumes 
some common knowledge among potential read- 
ers. In a textbook on science, the author will 
define the scientific terms but not the general 
English vocabulary. In an adult's dictionary, 
all words are defined, but a certain knowledge of 
the "world" (common sense, typical situations) 
is assumed as common adult knowledge, so the 
emphasis of the definitions might not be on sim- 
ple cases but on more ambiguous or infrequent 
cases. To learn the basic vocabulary used in 
day to day life, a very simple children's first dic- 
tionary is a good place to start. In (Barri~re, 
1997), such a dictionary is used for an appli- 
cation of LKB construction in which no prior 
semantic knowledge was assumed. In the same 
research the author explains how to use a multi- 
stage process, to transform the sentences from 
the dictionary into conceptual graph represen- 
tations. This dictionary, the American Her- 
itage First Dictionary 1 (AHFD), is an ex- 
ample of a good source of knowledge in terms 
of simplicity, clarity and redundancy. Some def- 
initions introduce concepts that are mentioned 
again in other definitions. 
2.2 Gathering of information 
Barri~re and Popowich (1996) presented the 
idea of concept clustering for knowledge integra- 
tion. First, a Lexical Knowledge Base (LKB) is 
built automatically and contains all the nouns 
and verbs of the AHFD, each word having its 
definition represented using the CG formalism. 
Here is a brief summary of the clustering pro- 
cess from there. It is not a statistical cluster- 
ing but more a "graph matching" type of clus- 
tering. A trigger word is chosen and the CG 
representation of its defining sentences make up 
the initial CCKG (Concept clustering knowl- 
edge graph). The trigger word can be any word, 
1Copyright (~)1994 by Houghton Mifflin Company. 
Reproduced by permission from THE AMERICAN 
HERITAGE FIRST DICTIONARY. 
but preferably it should be a semantically sig- 
nificant word. A word is semantically signifi- 
cant if it occurs less than a maximal number 
of times in the text, therefore excluding gen- 
eral words such as place, or person. The clus- 
tering is really an iterative forward and back- 
ward search within the LKB to find definitions 
of words that are somewhat "related" to the 
trigger word. A forward search looks at the def- 
inition of the words used in the trigger word's 
definition. A backward search looks at the defi- 
nition of the words that use the trigger word to 
be defined. A word becomes part of the cluster 
if its CG representation shares a common sub- 
graph of a minimal size with the CCKG. The 
process is then extended to perform forward and 
backward searches based on the words in the 
cluster and not only on the trigger word. 
The cluster becomes a set of words related to 
the trigger word, and the CCKG presents the 
trigger word within a large context by showing 
all the links between all the words of the clus- 
ter. The CCKG is a merge of all individual CGs 
from the words in the cluster. 
Table 1 shows examples of clusters found by 
using the clustering technique on the AHFD. If 
a word is followed by _#, it means the sense # 
of that word. The CCKGs corresponding to the 
clusters are not illustrated as it would require 
much space to show all the links between all 
the words in the clusters? 
The clustering method described is based on 
the principle that information is acquired from 
a machine readable dictionary (the AHFD), and 
therefore each word is associated with some 
knowledge pertaining to it. To extend this clus- 
tering technique to a knowledge base containing 
non-classified pieces of information, we would 
need to use some indexing scheme allowing ac- 
cess to all the sentences containing a particular 
2The reader might wonder why such word as \[rain- 
bow\] is associated with \[needle_l\] or why \[kangaroo\] is 
associated with \[stomach\]. The AHFD tells the child 
that "A rainbow looks like a ribbon of many colors across 
the sky." and "Kangaroo mothers carry their babies in a 
pocket in ~ront of their stomachs." The threshold used 
to define the minimal size of the common subgraph nec- 
essary to include a new word in the cluster is established 
experimentally. Changing that threshold will change 
the size of the resulting cluster therefore affecting which 
words will be included. The clustering technique, and 
a derived extended clustering technique are explained in 
much details in (Barri~re and Fass, 1998). 
104 
Table 1: Multiple clusters from different words 
Trigger Cluster 
word 
needle_l {needle_l, sew, cloth, thread, wool, 
handkerchief, pin, ribbon, string, rainbow} 
sew 
kitchen 
stove 
stomach 
airplane 
elephant 
soap 
wash 
{sew, cloth, needle_i, needle_2, thread, 
button, patch_i, pin, pocket, wool, 
ribbon, rug, string, nest, prize, rainbow} 
kitchen, stove, refrigerator, pan} 
{stove; pan, kitchen, refrigerator, pot, clay} 
{stomach, kangaroo, pain, swallow, mouth} 
{airplane, wing, airport, fly_2, helicopter, 
jet, kit, machine, pilot, plane} 
{elephant, skin, trunk_l, ear, zoo, bark, 
leather, rhinoceros} 
{soap, dirt, mix, bath, bubble, suds, 
wash, boil, steam} 
{wash, soap, bath, bathroom, suds, 
bubble, boil, steam} 
word in them. 
3 Semantic disambiguation 
We propose ill this section a way to attempt at 
solving different types of semantic ambiguities 
by using the redundancy of information result- 
ing from the clustering technique as briefly de- 
scribed in the previous section. Going through 
an example, we will look at three types of se- 
mantic ambiguity: anaphora resolution, word 
sense disambiguation, and relation disambigua- 
tion. 
In Figure 1, Definition 3.1 shows one sen- 
tence in the definition of mail_l (taken from 
the AHFD, as all other definitions in Figure 1) 
with its corresponding CG representation. Def- 
inition 3.2 shows one sentence in the definition 
of stamp also with its CG representation. Using 
the clustering technique briefly described in the 
previous section, the two words are put together 
into a cluster triggered by the concept \[mail_l\]. 
Result 3.1 shows the maximal join 3 between 
the two previous graphs around shared concept 
\[mail_l\]. Combining the information from stamp 
and mail_l, puts in evidence the redundant in- 
formation. The reduction process for eliminat- 
ing this redundancy will solve some ambigui- 
ties. This process is based on the idea of find- 
ing "compatible" concepts within a graph. Two 
concepts are compatible if their semantic dis- 
tance is small. That distance is often based on 
aA maximal join is an operation defined within the 
CG formalism to gather knowledge from two graphs 
around a concept that they both share. 
the relative positions of concepts within the con- 
cept hierarchy (Delugach, 1993; Foo et ai., 1992; 
Resnik, 1995). For the present discussion we as- 
sume that two concepts are compatible if they 
share a semantically significant common super- 
type, or if one concept is a supertype of the 
other. 
In Result 3.1, the concept \[send\] is present 
twice, and also the concept \[letter\] is present 
in two compatible forms: \[letter\] and \[mes- 
sage\]. The compatibility comes from the pres- 
ence in the type hierarchy 4 of one sense of 
\[letter\], \[letter_2\], as being a subtype of \[mes- 
sage\]. These compatible forms actually allow 
the disambiguation of concept \[letter\] into \[let- 
ter_2\]. This should update the definition of 
stamp shown in Definition 3.2. The other sense 
of \[letter\], \[letter_l\] is a subtype of \[symbol\]. 
The pronoun they in Result 3.1 must refer to 
some word, either previously mentioned in the 
sentence, or assumed known (as a default) in the 
LKB. Both (agent) relations attached to con- 
cept \[send\] lead to compatible concepts: \[they\] 
and \[person\]. We can therefore go back to the 
graph definition of \[stamp\] in which the pronoun 
\[they\] could have referred to the concepts \[let- 
ters\], \[packages\], \[people\] or \[stamps\], and now 
disambiguate it to \[people\]. 
Result 3.2 shows the internal join which es- 
tablishes coreference links (shown by *x, *y, 
*z) between compatible concepts that are in an 
identical relation with another concept. The re- 
duced join, after the redundancy is eliminated, 
is shown in Result 3.3. 
Two types of disambiguation (anaphora res- 
olution and word sense disambiguation) were 
shown up to now. The third type of disam- 
biguation is at the level of the semantic re- 
lations. For this type of ambiguity, we must 
briefly introduce the idea of a relation hierar- 
chy which is described and justified in more de- 
tails in (Barri~re, 1998). A relation hierarchy, 
as presented in (Sowa, 1984), is simply a way 
to establish an order between the possible rela- 
tions. The idea is to include relations that cor- 
respond to the English prepositions (it could be 
the prepositions of any language studied) at the 
top of the hierarchy, and consider them gener- 
alizations of possible deeper semantic relations. 
4The type hierarchy has been built automatically 
from information extracted from the AHFD. 
105 
Definition 3.1 - 
MAIL_I : People send messages through the mail. 
\[send\]-> (agent)->\[person:plural\] 
-> (object)->\[message:plural\] 
-> (through)->\[mail_l\] 
Definition 3.2 - 
STAMP : People buy stamps to put on letters and packages they send through the mail. 
\[send\]-> (object)-> \[letter:plural\]-> (and)- > \[package:plu ral\] 
<-(on) <-\[put\] <-(goal) <-\[buy\]->(agent)-> \[person:plural\] 
-> (object)->\[stamp:plural\] 
- > (agent)- > \[they\] 
-> (through)->\[mail_l\] 
Result 3.1 - Maximal join between mail_l and stamp 
\[send\]-> (object)-> \[letter:plural\]-> (and)- > \[package:plu ral\] 
<-(on) <-\[put\] <-(goal) <-\[buy\]->(agent)-> \[person:plural\] 
-> (object)->\[stamp:plural\] 
-> (agent)-> \[they\] 
-> (through)-> \[mail_l\] <-(through)<-\[send\]-> (object)-> \[message:plural\] 
-> (agent)->\[person:plural\] 
Result 3.2 - Internal Join on Graph maiL1/stamp 
\[send *y\]->(object)->\[letter:plural *z\]->(and)->\[package:plural\] <-(on) <-\[put\] <-(goal) <-\[buy\]->(agent)-> \[person:plural\] 
->(object)->\[stamp:plural\] 
- > (agent)- > \[they *x\] 
->(through)->\[mail_l\]<-(through)<-\[send *y\]->(object)->\[message:plural *z\] 
-> (agent)->\[person:plural *x\] 
Result 3.3 - After reduction of graph mail_l/stamp 
\[letter_2:plural\]-> (and)-> \[package:plu ral\] 
<-(object) <-\[send\]-> (agent)-> \[person:plural\] 
-> (through)->\[mail_l\] 
<-(on) <-\[put\] <-(goal) <-\[buy\]-> (agent)->\[person:plural\] 
-> (object)-> \[stamp: plural\] 
Figure 1: Example of ambiguity reduction 
106 
Definition 3.3 - 
CARD_2 : You send cards to people in the mail. 
\[send\]-> (agent)- > \[you\] 
- > (object)-> \[card_2:plu ral\] 
-> (to)-> \[person:plural\] 
-> (in)->\[mail_l\] 
Result 3.4 - Graph mail_l/stamp joined to card (after internal join) 
\[letter_2:plural\]-> (and)- > \[package:plural\] 
<-(object) <-\[send *y\]->(agent)->\[person:plural *z\] 
-> (through)-> \[mail-l\] <-(in)<-\[send *y\]-> (to)-> \[person :plural\] 
-> (agent)-> \[you *z\] 
->(object)->\[card_2:plural\] 
<-(on) <-\[put\] <-(goal) <-\[buy\]-> (agent)-> \[person :plural\] 
-> (object)->\[stamp:plural\] 
Result 3.5 - After reduction of graph mail_l/stamp/card_2 
\[letter.2:plural\]-> (and)- > \[package: plural\]- > (and)-> \[card-2:plural\] 
<-(object) <-\[send\]-> (agent)- > \[person:plural\] 
->(manner)->\[mail_l\] 
->(to)-> \[person:plural\] 
<-(on)<-\[put\] <-(goal)<-\[buy\]-> (agent)-> \[person :plural\] 
-> (object)-> \[sta m p:plu ral\] 
Figure 1: Example of ambiguity reduction (continued) 
This relation hierarchy is important for the 
comparison of graphs expressing similar ideas 
but using different sentence patterns that are 
reflected in the graphs by different prepositions 
becoming relations. Let us look in Figure 1 at 
Definition 3.3 which gives a sentence in the defi- 
nition of \[card_2\] and Result 3.4 which gives the 
maximal join with graph mail_l/stamp from re- 
sult 3.3 around concept \[mail_l\]. 
Subgraphs \[send\]->(in)->\[mail_l\] and \[send\]- 
> (through)-> \[mail_l\] have compatible concepts 
on both sides of two different relations. These 
two prepositions are both supertypes of a re- 
stricted set of semantic relations. On Figure 2 
which shows a small part of the relation hierar- 
chy, we highlighted the compatibility between 
through and in. It shows that the two prepo- 
sitions interact at manner (at location as well 
but more indirectly). Therefore, we can estab- 
lish the similarity of those two relations via the 
manner relation, and the ambiguity is resolved 
as shown in Result 3.5. 
Note that the concept \[person\] is present 
many time among the different graphs in Fig- 
ure 1. This gives the reader an insight into the 
complexity behind clustering. It all relies on 
compatibility of concepts and relations. Com- 
patibility of concepts alone might be sufficient if 
the concepts are highly semantically significant, 
but for general concepts like \[person\], \[place\], 
\[animal\] we cannot assume so. In the graph pre- 
sented in Result 3.5, there are buyers of stamps, 
receivers and senders of letters and they are all 
people, but not necessarily the same ones. 
We saw the redundancy resulting from the 
clustering process and how to exploit this re- 
dundancy for semantic disambiguation. We see 
how redundancy at the concept level without 
the relations can be very misleading, and the 
following section emphasize the importance of 
semantic relations. 
107 
with \['~ I throughl at on 
accompanimen/ part-o/~ ,~~ 
instrument y ~ ~ point-in-time 
about 
Imannerl Ilocationl 
destination direction source 
Figure 2: Small part of relation taxonomy. 
4 The importance of semantic 
relations 
Clusters are and have been used in different 
applications for information retrieval and word 
sense disambiguation. Clustering can be done 
statistically by analyzing text corpora (Wilks 
et al., 1989; Brown et al., 1992; Pereira et 
al., 1995) and usually results in a set of words 
or word senses. In this paper, we are using 
the clustering method used in (Barri~re and 
Popowich, 1996) to present our view on re- 
dundancy and disambiguation. The clustering 
brings together a set of words but also builds a 
CCKG which shows the actual links (semantic 
relations) between the members of the cluster. 
We suggest that those links are essential in an- 
alyzing and disambiguating texts. When links 
are redundant in a graph (that is we find two 
identical links between two compatible concepts 
at each end) we are able to reduce semantic am- 
biguity relating to anaphora and word sense. 
The counterpart to this, is that redundancy at 
the concept level allows us to disambiguate the 
semantic relations. 
To show our argument of the importance of 
links, we present an example. Example 4.1 
shows a situation where an ambiguous word 
chicken (sense 1 for the animal and sense 2 
for the meat) is used in a graph and needs 
to be disambiguated. If two graphs stored in 
a LKB contain the word chicken in a disam- 
biguated form they can help solving the ambi- 
guity. In Example 4.1, Graph 4.1 and Graph 4.2 
have two isolated concepts in common: eat and 
chicken. Graph 4.1 and Graph 4.3 have the 
same two concepts in common, but the addi- 
tion of a compatible relation, creating the com- 
mon subgraph \[eat\]->(object)->\[chicken\], makes 
them more similar. The different relations be- 
tween words have a large impact on the meaning 
of a sentence. In Graph 4.1, the word chicken 
can be disambiguated to chicken_2. 
Example 4.1 - 
John eats chicken with a fork. 
Graph 4.1 - 
\[eat\]-> (agent)-> \[John\] 
->(with)->\[fork\] 
-> (object)- > \[chicken\] 
John's chicken eats grain. 
Graph 4.2 - 
\[eat\]-> (agent)-> \[chicken_l\] <-(poss) <-\[John\] 
-> (object)->\[grain\] 
108 
John likes to eat chicken at noon. 
Graph 4.3 - 
\[like\]-> (agent)-> \[John\] 
-> (goal)->\[eat\]-> (object)->\[chicken_2\] 
->(tirne)-> \[noon\] 
Only if we look at the relations between words 
can we understand how different each statement 
is. It's all in the links... Of course those links 
might not be necessary at all levels of text anal- 
ysis. If we try to cluster documents based on 
keywords, well we don't need to go to such a 
deep level of understanding. But when we are 
analyzing one text and trying to understand the 
meaning it conveys, we are probably within a 
narrow domain and the relations between words 
take all their importance. For example, if we 
are trying to disambiguate the word baseball 
(the sport or the ball), both senses of the words 
will occur in the same context, therefore using 
clusters of words that identify a context will not 
allow us to disambiguate between both senses. 
On the other hand, having a CCKG showing 
the relations between the baseball_l (ball), the 
bat, the player and the baseball_2 (sport), will 
express the desired information. 
5 Conclusion 
We presented the problem of semantic disam- 
biguation as solving ambiguities at the concept 
level (word sense and anaphora) but also at 
the link level (the relations between concepts). 
We showed that when gathering information 
around a particular subject via a clustering 
method, we tend to cumulate similar facts ex- 
pressed in slightly different ways. That redun- 
dancy is expressed by multiple copies of com- 
patible/identical concepts and relations in the 
resulting graph which is called a CCKG (Con- 
cept Clustering Knowledge Graph). The re- 
dundancy within the links (relations) helps dis- 
ambiguate the concepts they connect and the 
redundancy within the concepts helps disam- 
biguate the links connecting them. Clustering 
has been used a lot in previous research but only 
at the concept level; we propose that it is essen- 
tial to understand the links between the con- 
cepts in the cluster if we want to disambiguate 
between elements that share a similar context 
of usage. 

References 
C. Barri6re and D. Fass. 1998. Dictionary vali- 
dation through a clustering technique. To be 
published in the Proceedings of Euralex'98: 
Eight EURALEX International Congress on 
Lexicography, Belgium, August 1998. 
C. Barri~re and F. Popowich. 1996. Concept 
clustering and knowledge integration from a 
children's dictionary. In Proc. o\] the 16 ~h 
COLING, Copenhagen, Danemark, August. 
C. Barri~re. 1997. From a Children's First Dic- 
tionary to a Lexical Knowledge Base of Con- 
ceptual Graphs. Ph.D. thesis, Simon Fraser 
University, June. 
C. Barri~re. 1998. The relation hierarchy: 
one key to representing natural language 
using conceptual graphs. Submitted at 
ICCS98: International Conference on Con- 
ceptual Structures, to be held in Montpellier, 
France, August 1998. 
P. Brown, V.J. Della Pietra, P.V. deSouza, J.C. 
Lai, and R.L. Mercer. 1992. Class-based 11- 
gram models of natural language. Computa- 
tional Linguistics, 18(4):467-480. 
H. S. Delugach. 1993. An exploration into 
semantic distance. In H. D. Pfeiffer and 
T. E. Nagle, editors, Conceptual Structures: 
Theory and Implementation, pages 119-124. 
Springer, Berlin, Heidelberg. 
N. Foo, B.J. Garner, A. Rao, and E. Tsui. 1992. 
Semantic distance in conceptual graphs. In 
T.E. Nagle, J.A. Nagle, L.L. Gerholz, and 
P.W.Eklund, editors, Conceptual Structures: 
Current Research and Practice, chapter 7, 
pages 149-154. Ellis Horwood. 
F. Pereira, N. Tishby, and L. Lee. 1995. Distri- 
butional clustering of english words. In Proc. 
of the 33 th A CL, Cambridge,MA. 
Philip Resnik. 1995. Using information content 
to evaluate semantic similarity in a taxonomy. 
In Proc. o\] the 1.~ th IJCAI, volume 1, pages 
448-453, Montreal, Canada. 
J. Sowa. 1984. Conceptual Structures in Mind 
and Machines. Addison-Wesley. 
Y. Wilks, D. Fass, G-M Guo, J. McDonald, 
T. Plate, and B. Slator. 1989. A tractable 
machine dictionary as a resource for computa- 
tional semantics. In Bran Boguraev and Ted 
Briscoe, editors, Computational Lexicography 
for Natural Language Processing, chapter 9, 
pages 193-231. Longman Group UK Limited. 
