Inherited Feature-based Similarity Measure Based on Large 
Semantic Hierarchy and Large Text Corpus 
Hideki Hirakawa 
Toshiba I{.&D Center 
1 Komukai Toshiba-cho, Saiwai-ku, 
Kawasaki 210, JAPAN 
hirakawa¢ee:l., rdc. toshiba, co. j p 
Zhonghui Xu, Kenneth Haase 
MIT Media Laboratory 
20 Ames Street 
Cambridge, MA 02139 USA 
{xu, haase} @media. mit. edu 
Abstract 
We describe a similarity calculation 
model called IFSM (Inherited Feature 
Similarity Measure) between objects 
(words/concepts) based on their com- 
mon and distinctive features. We pro- 
pose an implementation method for ob- 
taining features based on abstracted 
triples extracted fi'om a large text eorpus 
utilizing taxonomical knowledge. This 
model represents an integration of tradi- 
tional methods, i.e,. relation b~used sin> 
itarity measure and distribution based 
similarity measure. An experiment, us- 
ing our new concept abstraction method 
which we <'all the fiat probability group- 
ing method, over 80,000 surface triples, 
shows that the abstraction level of 3000 
is a good basis for feature description. 
'1 Introduction 
Determination of semantic similarity between 
words is an important component of linguis- 
tic tasks ranging from text retrieval and filter- 
ing, word sense disambiguation or text match- 
ing. In the past five years, this work has evolved 
in conjunction with the availability of powerful 
computers and large linguistic resources such as 
WordNet (Miller,90), the EDR concept dictionary 
(EDR,93), and large text corpora. 
Similarity methods can be broadly divided into 
"relation based" methods which use relations in 
an ontology to determine similarity and "distribu- 
tion based" methods which use statistical analysis 
as the basis of similarity judgements. This article 
describes a new method of similarity nmtehing, in- 
herited feature based similarity matching (IFSM) 
which integrates these two approaches. 
Relation based methods include both depth 
based and path based measures of similarity. 
The Most Specific Common Abstraction (MSCA) 
method compares two concepts based on the tax- 
onomic depth of their common parent; for exam- 
ple, "dolphin" and "human" are more similar than 
"oak" and "human" because the common concept 
"mammal" is deeper in the taxonomy than "living 
thing". 
Path-length similarity methods are based on 
counting the links between nodes in a semantic 
network. (Rada,89) is a widely adopted approach 
to such matching and (Sussna,93) combines it 
with WordNet to do semantic disambiguation. 
The chief problems with relation-b~sed similar- 
ity methods lie in their sensitivity to artifacts in 
the coding of the ontology, l;or instance, MSCA 
algorithms are sensitive to the relative deplh and 
detail of different parts of the concept taxon- 
omy. If one conceptual domain (say plants) is 
sketchily represented while another conceptual do- 
main (say,, animals) is richly represented, similar- 
ity comparisons within the two domains will be in- 
commensurable. A similar problem plagues path- 
length based algorithms, causing nodes in richly 
structured parts of the ontology to be consistently 
judged less similm" to one another than nodes in 
shallower or hess complete parts of the ontology. 
Distribution-based methods are based on the 
idea that the similarity of words can be derived 
frorn the similarity of the contexts in which they 
occur. These methods difl'er most significantly 
in the way they characterize contexts and the 
similarity of contexts. Word Space (Schutze,93) 
uses letter 4-grams to characterize both words and 
the contexts in which they appear. Similarity is 
based on 4-grams in common between the con- 
texts. Church and tlanks ('89) uses a word win- 
dow of set size to characterize the context of a 
word based on the immediately adjacent words. 
Other methods include the use of expensive-to- 
derive features such as subject-verb-object (SVO) 
relations (Hindle,90) or other grammatical rela- 
tions (Grefenstette,94). These choices are not sim- 
ply iml)lemelltational but imply ditferent similar- 
ity judgements. The chief problem with distribu- 
tion based methods is that they only permit the 
formation of first-order concepts definable directly 
in terms of the original text. Distribution based 
methods can acquire concepts b~sed on recurring 
patterns of words but not on recurring patterns 
of concepts. \[,'or instance, a distributional sys- 
tem could easily identify that an article involves 
lawyers based on recurring instances of words like 
"sue" or "court". But it could not use the oc~ 
currence of these concepts as conceptual cues for 
508 
<lewfloping coneel)ts like "lit igadon" or "l)\]eading" 
in connection with the "lawyer" eoncel)t. 
One. notable integration of relation t)ased and 
distrilmtional methods is llesnik's annotation of 
a relational ontology wil h distributional in fornla- 
lion (l{esnik,95a,95b). \]lesnik inLroduees a "class 
probability" associated with nodes (synset.s)in 
WoMNet and uses these to determiue similarity. 
Given these probabilities, he eOltlptttes tile simi- 
larit.y of concepts I+)ased on the "inl'on nation" that 
wou\](l be necessary to distinguish them, tneasured 
ttsing iMbrmalion-theoretie calculations+ 
The Feature-based Similarity 
Measure 
The Inherited Feature Similarity Measure (IFSM) 
is another integrated approach to measuring simi- 
la.rity. It uses a semantic knowledge base where 
concepts are annotated wit\]\] disli<qlli.sbiW\] fi'a- 
ltu'es and i)ases similarity on (:otnl>aril~.g these sels 
of feal;ures. In our exl)erime\]tts, we deriw>d the 
feature sets I) 3, a distJ'ilmtiona\] analysis of +t large 
t :Ol: l) tiN. 
Most existing relation-hase(l similarity methods 
directly use l,he relat:iotl ~Ol)O/ogy of the seman- 
tic network to derive similarity, either by strate- 
gies like link counting (f~a(la,89) or tim determina- 
tion of the depth <)f <:otnmon al)slra<:lions (Kolod: 
net,g9). \[FSM, in eontrasl., uses the I:Ol)O\]Ogy to 
derive (leseril)lions whose (:omparisotl yields a sim- 
ilarity measure. In l)arti(:ular, it aSSlllnes art Otl- 
I:o\[ogy where: 
I. Each (:once\])l; has a set of features 
2. Each concept inherits Features from its get> 
erMizations (hypernyms) 
3. \]!;;u:h concept has one or more "(listinctiw~ 
features" which are not inherite(l ft:om its hy- 
\])el:nylllS. 
Note that we neidter claim nor require t:hat the 
features eonq>letely charaelerize their (:(mcepts or 
lhat inh<'.ritan<:e of feal m:es is sound. We only re- 
quire dlat there I)e some set of feal;ul:es we use for 
similarity judgcmettts. For instance, a similarity 
.iudgenle31t betwe(+m a penguin and a rot)in will t)e 
partially based on the fe++ture "ean-\[ly" assigned 
to the concel)t bird, ewm though it (toes not apl)ly 
it~dividually to t)et\]guins. 
Fig I shows a Siml)le exatnple o\[' a fragment of 
a (:oncel~ttud taxonomy wiLl~ associated featttres. 
Inherited features are in italh: while disliuctive 
llalcnl(< h~vu-chiid:>) 
falhcl(< male >< \]lave child >) iflothel(< female >< hove-chihl >1 
Fig. 1 Fragment ot' c(mccptual taxonomy 
\[Salutes are in bold. In our model, features have 
a weight l)ased otl the importance o1' the feature 
to the eolleel)t. 
We \[laV(~ chosel\] to alltOlIla, tieally gel\]erate {'ea- 
tures (listril)utionally by analyzing a large eOrl)US. 
We (leseribe lids geueration process below, but we 
will \[irst ttlrtl to the e\qthlgti()tl of similarity based 
on feat ural analysis. 
2.1 At)i)roaehes to Featm'e Matelfing 
'l'here are a variety of similarity measures awu\]- 
able for sets of \[;~atm'es, biLL all make their eom- 
l)arisons t)ase(l on some combination of shared 
\['etltlH;es, disLilleL \['eal ttres, altd sharect ttl)sellL l'ea-. 
tures (e.g., neither X or Y is red). For example, 
Tversky ('77) proposes a mode\] (based on huntan 
similarity judgements) where similarity is a linear 
combination of shared and distinct features where 
each f('atm'e is weighted 1)ased on its itnl)ortatme+ 
'l'w>rsky's experiment showed the highesl eorrela- 
lion with hunmn subjects ~ feelings when weighted 
shared and dislinet features are taken into consi(l- 
eration. 
HI~X'I'ANT ((~reii:nstette,94) introduce(1 the 
\Veighted 3aeeard Measure which combitms the 
Jaeeard Measure with weights derive(l froth an 
inh)rmation theoreti<: anMysis of %ature occur- 
fences+ '\]'he we:ight of a feature is com\[mte(l from 
a global weight (based on the nmuber of glohal 
occurrences of the, wor(l or concept) and a \[()(:at 
weight (based Oil the \['re(lllellcy Of tlt+> Features at- 
laehed to the word). 
\]n our (:urrent work. we have adol)te(t the 
Weighted .laeeard Measure for prelimit,ary ewJ- 
tmti(m of otu" al)lJroaeh. 'l'he clistinetiw" feature 
of our apl):roach is the rise of the ontology I.o (|e+ 
rive features rather than assuming atomic sets of 
Rmtures. 
2.2 Properties of IFSM 
/u this section we compare IFSM's similarity 
judgements to those generated by other tneth- 
()<Is. In our diseltssiou, we will consider the sim- 
ple netwoH¢ o~' Fig 2. We will use 1he expression 
sim.(ci, cj' ) to denote the similarity of eoncel)ts (h 
arid e2. 
Given lhe situation of Fig 2, both MS(L,\ 
an(t tlesnik's M ISM (Most In formative Sul>stmtor 
Method) asse,'t .s'im(Ct,C2) = sirn(C2, C3). 
MSCA makes the sitnilarit.y the satile because they 
have the sante (nearest) eotmnon abstraction CO. 
MISM holds the similarity Io be the same 1)eeause 
( :ll 
('l (:2 
~"( "3 
I"ig.2 I"xanal)le of a hierardtical strttctul'e 
509 
Hi~h~ 
Fig.3 lsonlol-phic suhstractures 
in higher/lower levels of hierarchy 
the assertion of C2 adds no information given the 
assertion of C3. Path-length methods, in contrast, 
assert sire(C1, C2) < sire(C2, C3) since the num- 
ber of links between the concepts is quite different. 
Because IFSM depends on the features derived 
from the network rather than on the network it- 
self, judgements of similarity depend on the ex- 
act features assigned to C1, C2, and C3. Because 
IFSM assumes that some distinctive features ex- 
ist for C3, sire(el, 62) and sire(el, C3) are un- 
likely to be identical. In fact, unless the distinc- 
tive features of C3 significantly overlap the dis- 
tinctive feature of C1, it will be the case that si,~(C1, C2) < si,~(C2, C3). 
IFSM differs from the path length model be- 
cause it is sensitive to depth. If we assume a rel- 
atively uniform distribution of features, the total 
number of features increases with depth in the hi- 
erarchy. This means that sim(C0,C1) located in 
higher part of the hierarchy is expected to be less 
than sim(C2,C3) located in lower part of the hi- 
erarchy. 
3 Components of IFSM model 
IFSM consists of a hierarchical conceptual the- 
saurus, a set of distinctive features assigned to 
each object and weightings of the features. We 
can use, for example, WordNet or the EDR con- 
cept dictionary as a hierarchical conceptual the- 
saurus. Currently, there are no explicit methods 
to determine sets of distinctive features and their 
weightings of each object (word or concept). 
Here we adopt an automatic extraction of fea- 
tures and their weightings from a large text cor- 
pus. This is the same approach as that of the dis- 
tribdted semantic models. However, in contrast 
to those models, here we hope to make the level 
of the representation of features high enough to 
capture semantic behaviors of objects. 
For example, if one relation and one object can 
be said to describe the features of object, we can 
define one feature Of "human" as "agent of walk- 
ing". If more context is allowed, we can define 
a feature of "human" as "agent of utilizing fire". 
A wider context gives a precision to the contents 
of the features. However, a wider context expo- 
nentially increases the possible number of features 
which will exceed current limitations of computa- 
tional resources. In consideration of these factors, 
we adopts triple relations such as "dog chase cat", 
"cut paper with scissors" obtained from the cot- 
"k dog chases a cat" "k hound chases a cat" "A dog chases a kitty" 
("chase" "dog" "cat") (*'chase" "hound" "cat") ("chase" "dog" "kitty") 
.... .............. o' o .............. o / ....... o ........... 
' "1 I / / |.(so ~ ~ ,,.0.J ~0, lo/ /-<so ~,~ h,~.~ ki,,.51 
.... ..................... i,,,,,il .............. i ............... W ............ i .......... O-iso ~,,,,~c Jo~ ¢,~, -,,) 
0 0 (9 0 i 
.- .e%ed . .r,ooe Tri,l s ............ N .............. N .............. ....... 
i O O O¢ O O .O O O O O O 
L. Deep Triples ............ ~~ ....... \[ ........ ~ .... 
/ / / /(SO v228 n5 n9 ... 5.3 32 ("dll, Se" "nm after")("dog" "hound"~{"cat" "kitty")) 
:~0 0 O" 0 0 0 0 0 i,,, II 1 1 
• Abstracted Triples ............................................................... 
I I I 
I Filtering Heuristics I 
0 0 
,,, Filtered Abstracted Triples ................................................... 
Fig.4 Abstracted triple extraction from corpus 
pus as a resource of features, and apply class based 
abstraction (Resnik 95a) to triples to reduce the 
size of the possible feature space. 
As mentioned above, features extracted fi'om 
the 
corpus will be represented using synsets/concepts 
in IFSM. Since no large scale corpus data with 
semantic tags is available, the current implemen- 
tation of IFSM has a word sense disambiguation 
problem in obtaining class probabilities. Our cur- 
rent basic strategy to this problem is similar to 
(Resnik,95a) in the sense that synsets associated 
with one word are assigned uniform frcquency or 
"credit" when that word appears in the corpus. 
We call this strategy the "brute-force" approach, 
like Resnik. On top of this strategy, we introduce 
filtering heuristics which sort out unreliable flata 
using heuristics based on the statistical properties 
of the data. 
4 The feature extraction process 
This section describes the feature extraction pro- 
cedure. If a sentence "a dog chased a cat" ap- 
pears in the corpus, features representing "chase 
cat" and "dog chase" may be attached to "dog" 
and "cat" respectively. Fig 4 shows the overall 
process used to obtain a set of abstracted triples 
which are sources of feature and weighting sets for 
synsets. 
4.1 Extraction of surface typed triples 
from the corpus 
Typed surface triples are triples of surface words 
holding some fixed linguistic relations (Hereafter 
call this simply "surface triples"). The current im- 
plementation has one type "SO" which represents 
510 
"subject - verb - object" relation. A set of typed 
surface triples are extracted from a corpus with 
their frequencies. 
Surface triple set 
(TYPE VERB NOUN1 NOUN2 FREQUENCY) 
Fx. (SO "ch~se" "<log" "cat" 10) 
4.2 Expansion of sin-face triples to deep 
triples 
Surface triples are expanded to corresponding 
deep triples (triples of synset IDs) by expanding 
each surface word to its corresponding synsets. 
The frequency of the surface triples is divided by 
the number of generated deep triples and it is as- 
signed to each deep triple. The frequency is also 
preserved ~ it is as an occurrence count. Surface 
words are also reserved for later processings. 
Deep triple collection 
(TYPE V-SYNSE'F N1-SYNSET N2-SYNSEq' FREQENCY 
OCCUttRENCE V-WORD NI-WORD N2-WORI)) 
Ex. (SO v123 n5 n9 0.2 10 "chase" "<log" "cat") 
"v123" and "n5" are synset IDs corresponding 
to word "chase" and "dog" respectively, These 
deep triples are sorted and merged. The frequen- 
cies and the occurrence counts are summed up 
respectively. The surface words are merged into 
surface word lists as the following example shows. 
Deep triple set 
(TYPE V-SYNSET N1-SYNSEq' N2-SYNSET FREQUENCY 
OCCURRENCE V-WOttDS N1-WORDS N2-WORDS) 
gx. (SO v123 n5 n9 0.7 15 
(" ch msc" ) (" dog" "hou nd ") ("cat ")) 
In this example, "dog" and "hound" have same 
synset ID "n9". 
4.3 Synset abstraction method 
The purpose of the following phases is to extract 
featm:e sets for each synset in an abstracted form. 
In an abstracted form, the size of each lhature 
space becomes tractable. 
Abstraction of a syuset can be done by divid~ 
ing whole synsets into the appropriate number of 
synset groups and determining a representative of 
each group to which each member is abstracted. 
There are several methods to decide a set of synset 
groups using a hierarchical structure. One of the 
simplest methods is to make groups by cutting the 
hierarchy structure at some depth from the root. 
We call this the flat-depth grouping method. An- 
other method tries to make the nmnber of synsets 
in a group constant, i.e., the upper/lower bound 
for a number of concepts is given as a criteria 
(ttearst,93). We call this the flat-size grouping 
method. In our implementation, we introduce a 
new grouping method called the flat-probability 
grouping method in which synset groups are speci- 
fied such that every group has the same class prob- 
abilities. One of the advantages of this method is 
that it is expected to give a grouping based on 
the quantity of information which will be suitable 
for the target task, i.e., semantic abstraction of 
triples. The degree of abstraction, i.e., the num- 
ber of groups, is one of the principal factors in 
deciding the size of the feature space and the pre- 
ciseness of the features (power of description). 
4.4 Deep triple abstraction 
Each synset of deep triples is abstracted based 
on the flat-probability grouping method. These 
abstracted triples are sorted and merged. Original 
synset IDs are maintained in this processing for 
feature extraction process. The result is called 
the abstracted deep triple set. 
Abstracted deep triple set 
(TYPE V-ABS-SYNSET NJ-ABS-SYNSET N2-ABS-SYNSFT 
V-SYNSEq'-LISq' N1-SYNSE'r-LIS'f N2-SYNSET-LIST 
SYN-FREQUENCY OCCURRENCE 
V-WORDS NI-WORDS N2-WORDS) 
Ex. (SO v28 n5 n9 
(v123 v224) (n5) (n9 n8) 5.3 32 C c! .... " "ru n 
"'after")C dog" "hound") C cat" "kit ty")) 
Synset "v28" is an abstraction of synset "v123" 
and synset "v224" which corresponds to "chase" 
and "run_after" respectively. Synset "ng" con:e- 
sponding to "cat" is an abstraction of synset "nS" 
corresponding to "kitty". 
4.5 Filtering abstracted triples by 
heuristics 
Since the current implementation adepts the 
"brute-force" approach, almost all massively gen- 
erated deep triples are fake triples. The filter- 
ing process reduces the number of abstracted 
triples using heuristics based on statistical data 
attached to the abstracted triples. There are 
three types of statistical data available; i.e., es- 
timated frequency, estimated occurrences of ab- 
stracted triples and lists of surface words. 
\[ler% the length of a surface word list associ- 
ated with an abstracted synset is called a surface 
support of the abstracted synset. A heuristics rule 
using some fixed frequency threshold and a surface 
support bound are adopted in the current imple- 
mentation. 
4.6 Common feature extraction from 
abstracted triple set 
This section describes a method for obtaining 
features of each synset. Basically a feature 
is typed binary relation extracted from an ab- 
stracted triple. From the example triple, 
(SO v28 115 n9 (v12a v224) (,,5) (,,9 ns) ,~,a a~ 
(" chase" "run "'after")(" dog" "hound") (" cat" "kitty")) 
the following features are extracted for three of 
the synsets contained in the above data. 
n5 (ov v28 n9 5.3 32 ("chase" "run"'after")("cat" "kitty")) 
i19 (sv v2S n5 5.3 32 ("chase" "run "'after" )(" dog" "hound" )) 
n8 (sv v28 n5 5.3 32 ("chase" "run"'after")('dog" "hound")) 
An abstracted triple represents a set of ex- 
mnples in the text corpus and each sentence in 
the corpus usually describes some specific event. 
This means that the content of each abstracted 
511 
triple cannot be treated as generally or univer- 
sally true. For example, even if a sentence "a 
man bit a dog" exists in the corpus, we cannot 
declare that "biting dogs" is a general property 
of "man". Metaphorical expressions are typical 
examples. Of course, the distributional semantics 
approach assumes that such kind of errors or noise 
are hidden by the accumulation of a large number 
of examples. 
However, we think it might be a more serious 
problem because many uses of nouns seem to have 
an anaphoric aspect, i.e., the synset which best fits 
the real world object is not included in the set of 
synsets of the noun which is used to refer to the 
real world object. "The man" can be used to ex- 
press any descendant of the concept "man". We 
call this problem the word-referent disambigua- 
tion problem. Our approach to this problem will 
be described elsewhc're. 
Preliminary experiments on 
feature extraction using 1010 
corpus 
In this section, our preliminary experiments of 
the feature extraction process are described. In 
these experiments, we examine the proper gran- 
ularity of abstracted concepts. We also discuss a 
criteria for evaluating filtering heuristics. Word- 
Net 1.4, 1010 corpus and Brown corpus are uti- 
lized through the exI)eriments. The 1010 corpus 
is a multiqayered structured corpus constructed 
on top of the FRAMEIX-D knowledge represen- 
tation language. More than 10 million words of 
news articles have been parsed using a multi-scale 
parser and stored in the corpus with mutual ref- 
erences to news article sources, parsed sentence 
structures, words and WordNet synsets. 
5.1 Experiment on fiat-probability 
grouping 
To examine the appropriate number of abstracted 
synsets, we calculated three levels of abstracted 
synset sets using the fiat probability group- 
ing method. Class probabilities for noun and 
verb synsets are calculated using the brute force 
method based on 280K nouns and 167K verbs ex- 
tracted fl'om the Brown eortms (1 million words). 
We selected 500, 1500, 3000 synset groups for 
candidates of feature description level. The 500 
node level is considered to be a lowest boundary 
and the 3000 node level is expected to be the tar- 
I)epth 1 2 3 4 5 6 7 8 
Synsets 611 122 966 2949 5745 12293 8384 7408 
Depth 9 10 11 12 13 14 15 16 
Synsets 5191 3068 1417 812 314 94 36 6 
Table 1. Depth/Noun_Synsels in WordNet 1.4 
Level 500 (518 synsets) 
1 (structure construction\](72\]9.47 4): a thing constructed; a 
con~.plex eonstruetioI'l or entity 
2 {time_period period period_of_tilne\](6934 3): a length ef 
time; "government services began during the colonial period" 
3 {organization\](6469.94 4): 
a group of people who work together 
4 {action}(6370.54 9): something done; 
5 {natural_object}(6277.26 3): an object occurring naturally; 
Level 3000 (3001 synsets) 
1 {natural language tongue mother tongue\](678.7 6): 
the language of a community~ 
2 {weapon arm weapon_system\](673.7(~ 6): 
used in fighting or hunting 
3 {head chief top_dog}(671.55 5): 
4 {capitalist}(669.45 4): 
a person who believes in the capitalistic system 
5 {point point_in_~ime}(669.29 8): a parti.cular clock time; 
Table 2: Synsets I)y flat-probal)illty grouping metho(1 
get abstraction level. This expectation is based on 
the observation that 3000 node granularity is em- 
pirically sulficient for deseribing the translation 
patterns for selecting the proper target Fmglish 
verb for one Japanese verb(lkehara,93). 
Table 1 shows the average synset node depth 
and the distribution of synset node depth of Word- 
Net1.4. Table 2 lists the top five noun synsets 
in the fiat probability groupings of 500 and 3000 
synsets. "{}" shows synset. The first and the sec- 
ond number in "0" shows the class frequency and 
the depth of synset respectively. 
Level 500 grout)ings contain a very abs|racted 
level of synsets such as "action", "time_period" 
and "natural_object". This level seems to be 
too general for describing the features of objects. 
In contrast, the level 3000 groupings contains 
"natural_language", "weapotf', "head,chief', and 
"point_in_time" which seems to be a reasonable 
basis for feature description. 
There is a relatively big depth gap between 
synsets in the abstracted synset group. F, ven in 
the 500 level synset group, there is a two-depth 
gap. In the 3000 level synset group, there is 
4 depth gap between "capitalist" (depth 4:) and 
"point_in_time" (depth 8). The interesting point 
here is that "point_in_time" seems to be more at). 
stract than "capitalist, " inluitively speaking. 
The actual synset numbers of each level of 
synset groups are 518, 15%8, and 3001. 'fhus 
the fiat probability grouping method can precisely 
control the lew'J of abstraction. Considering the 
possible abstraction levels available by the fiat- 
depth method, i.e., depth 2 (122 synsets), depth 
3 (966 synsets), depth 4 (2949 synsets), this is a 
great advantage over the flat probability grouping. 
5.2 Experiment: Abstracted triples from 
1010 corpus 
A preliminary experiment for obtaining abstract 
triples as a basis of features of synsets was con- 
ducted. 82,703 surface svo triples are extracted 
from the 101.0 corpus. Polarities of abstracted 
triple sets for 500, 1500, 3000 level abstraction 
are 1.20M, 2.03M and 2.30M respectively. Each 
512 
Level 1500 
1 {organization}{talk sp,:ak utter mouth verbulize vurbi~v} 
{ organization\] (70,4.24,1|)8) 
2 {organization){talk spcal~ utt<r inottth v, rl>aliz< ver/-,ify} 
{action}(5(;,'La5,112) 
3 {organization} {change ttndergo~a_change becozlLe_(liitk:rent} 
{l>ossession } (60,2.83,188) 
4 torgauization} {talk speak utter mouth vez'lmlizu vcrbi~,\] 
{ ....... ~t) (48,175,a4) 
5 {cn'ganiza*ion} {move displac ....... ke ......... } {action} 
(5(), ~ .s4,82) 
L,wel 3000 
I {c×pcrL} 
{greet z'(:cogniz~.:)/"cxpr<sa grc( ring:; up(m nlcc!ing ." 
{ due_process due_process_of_law} 
2 {jmT}/"a body of citi'zcns sworn to give a true verdict ." 
~l>ronoun(:e IM)el judge)/'})t'Ol'~Otll~('c, jttdgm, nt on" 
{capit~alist} (4,11.09,4) 
3 {police police_force constal)ulary law} 
{allege aver say}/"lt( alleged ltlat w( was the victim " 
{female fen:~alc_pel'son} (4,\].,3) 
4 {assc'nfl)ly}"a body that holds formal n~(x.lings" 
{refuse l't~jccL l),'/sS_tll3 Ltlrll_(Iown (h<:\[in(!}/" rcfllsO I;0 ~t(:('c\])t;" 
{r,:quest petition solicitation} ((;,0 25,G) 
5 {animal animate_b<:ing /)~;ml I)rttt0 Cl'c~tlttl'u ft~llllH}/ 
{win gaitL}/'win somvthiug dlrough one's; ,.ll\~vls" 
{ contest comt)etit ion\] ( 5,0 ..19,6) 
"()" .+I ...... (# ,,f s,,,,r ......... s,,pl,,,~+~ ,l',.,,q,,,.,,y:o ....... .i,,,, ....... ) 
TaMe 3: ~xamph! of abstracted triple.'+ 
abstract triple holds ft:equeu(:y, oc<:llJ'lX)llO(, lltl/ll-. 
be G and woful list, which mqq~(a't.s each of thce(~ 
al)st ra(:ted sy nsel:s. 
A lilt ering heuristic that elin~htates al:+sll'a<:t 
trit,les whose stlr\['ac(; Sul)pOrl is three (i.e., sup- 
ported })y only one sm'face \])~I~LC\]:II) iS al>plicd to 
each set of al)sLracLed Iril)les , ;111(1 l;(.','-.;tl\[I.s iu the 
R)llowing sizes of at)stract:ed triple sets in the 379K 
(level 500), 150b: (level 1500) and 561,: (\]ewq 3000) 
respectively. F, ach triple is assiglted a evaluation 
score which is a snt|, of m)rnmlized surface SUl)l)(~rL 
score (:: Sll.l:f3e('. Sllt)l)orl; s<:ore/tl-taXilHtllll ,qlll'I'~/ce 
SUl)l)orL score) ;+tim normalized \[\]:e(luet~(;y (~ fre+ 
(luency / nmxi/unm fi'equency). 'l'at)le 3 shows 
the top \[be abstra<'ted tril)les with respect to dw+ir 
ewduaLiot~ scores, ltetns in the talJe shows subject 
syllseL, vert)SyllSel;, oh j0el; synseL, sttrfa<:e sup- 
l)Orl;, fr<'qtleltcy ~tll(\] oc('.ll\])re\]lC(~ IlllIlI\])0I!S. 
All the sul<iccl;s in the top five al)sLract triples 
of level 500 are "organizal;iotf'. This seems to be. 
r0asonal)le bee;rose the COlll;eltl;s of the 10\] 0 corpus 
are news articles ~tt(t l:hese triples seem to show 
some highly abstract, briefing of the cont.ent, s of 
the corpus. 
The clfcclAveness of the filtering ;re(l/or scoring 
hcuri6dcs ca, n bc tl:l(~a,,stlr(;(l ttsilt~ tv¢(~ ch)scly re-. 
lated criteria. One measm:es the l)lausitfility o\[' 
al)stract.ed triple,s i.e., the r(x'all and l)cecision |'a-+ 
I;io of the l)\]ausible at)straeted Lriples. 'l'he other 
criteria shows the correctness of the mappings of 
die surface t;riple I)atLerns to abstracted tril)les. 
varsity \[htiled_Nations t:ealn subsidiary State state staff so+ 
vier school l'olitburo police patrol party palml Organization 
OI'(\[CI" operation lle!'vVSl)~/,})Vl' lilissioll Ministry lll(:II21)t'F lll\[tg\[~- 
zine lin(: law_firm law hind 3u:~tice_l)epartmcnt jury industry 
hOL|S(? h<~ztd(tual't,.'I'S govut'illI/tHlt g?tllg I:tlA division COlllq ,:'OllH- 
try co/lllcil Collf(!l'eltc(! <:(Jllll)tllly ('ommitlct (:ollcge (:ht\]2 Cabi- 
net business board associ~tion Association airline 
Table 4. SIIrflk(?( ~. Snplmrt, s of "orgmdzation" 
This is measured I)y counting the eon:ect surface 
supports of each absl.racted triple, l"or example, 
considering a set of sm:l';u:e words sut~port.ing "o> 
ganization" of Lhe I of level ,500 shown in table 4, 
the word "panel" rnight loe used as "panel board". 
'l'his abilily is also measm:ed by developing the 
word sense dismnbiguator whic.h inputs the sur- 
fa(:e tril)le and select:s lhe most l~\[ausil)le deep 
Iril)le based ou abstracted triple scores matched 
with the deep triple, 'Flm surface SUlh~octs iu 'l';t-- 
hie 4 show the intuitNe tendency that a suftlcient 
number of triple data will generate solid results. 
6 Conclusions 
This paper described a simil+~rity calculaik)n 
model between ol),je+cl.s based on commoz~ and dis- 
l inctiwe feal, ures ;-mcl prol)oses an hnplementation 
l>rocedu re \[br obtaining feat;ures based on al>stract 
lriples extracted l}om a large text <:orpus (1010 
corpu,~) utilizing taxonomical km)wle(lge (Word- 
Net). The exl)eritt|ettL , which used around 801{ 
SLlrfaee triples, shows l,}lal; t, he abstraction level 
3000 l)rovi(le.s a good basis for \['eal;ttre. (les<-zit)- 
lion. A feal;m'e extra(:tion eXl)erhnent based (m 
large trii)\]c (\[a~a is ()tic next. goal+ 
l{eferences 
\](~:nneth (Thurch mid Pal;rick Hanks. t98,9, l/Vo~d assoc2- 
at~on norms, ~nnt~al inj'orm,ttio.n, and h?.rlcogvaphy , \]n Pro- 
cccdings of th(. '27th Annual Me+~tittg of A(:I, 
E I)IL 1995. >'.umma~?/ for the I';I)\]~ Ele<:tronw l)ict'~o?~arg 
Version l Technical <:'lz~de El)If T1~2-005, Japan I';\]ectronic 
I )ictionary II.cs~ arch Iiml:il ul % 'l'okyo. 
(h'egory Grctbnstettc 1994. I';xptovations in A~ztotnatic 
7'k+saurus l)'~scovc*y, /(luwur Academic Publishers. 
Marti. A. Ilcarst and Him'lob S<hutze. \]9.q'l. (:~zs~om+znl9 
a Lc:!:~con to Hettcr .%tzt a Coml)utat'wnal Task. Proco.dmgs 
of the ACf, SIGI,I.;X Workshop, (hJmnbus, Ohio. 
\])cmald llindh. 19q0. No~z*t classzJicativn ~(r;;m pred'u:ale 
i~<q~t?ncn~ st?ltCttZ?t:.% Proceedings of life 28th Anntt~d Meeting 
of A( :1, 
S. Ikthm';t, M. Miyazaki, A. Yokoo. 1993. ('lass~firat'~o~ 
of J)a~.jtitt~je \[(llo2ulrtdfJ(! fo~' Mc a'lr*ny .Anahjs'ls Jn A.hz(h.zne 
TtamsJat~on, Transactions of h/lk)z'mal;ion t'rocesMng Society 
of Jal>aP, , Vol 34, No.8, pl>s. \]692-1704. 
J. Kolodner and C. H.iesbeck. Cast>l\]ased Rtasott~n~h 
tutorial t.xtbook of \] lth IJCAI. 
(;,org( A. Miller, ILichard ileckwith, Christiane t"el\[bmtm, 
I)tl'C\]( (\]I'ONS I Kad~crine Millet< 19!)0. l"~ve Papers on Word- 
N,t, (k)gnilivc Sciunce \],M)or;~tory ILeport 43, Princeton tJni- 
vcrsity. 
lloy Rada, Hafeclh Mill, \]"llen Bicknell, Maria Bh'ttner. 
1989. Z)eualvpme'nt alzd Apphcat+oa of a Metric on 5rc;tlaltltc 
N+ts, \]I';I';E qYansacl:ions on Systems~ MmG :rod ('yberncti,'s, 
\%1. tg, No. I. 
Phi|ip |tt:,nnik. l!10,gzt. ?i,'sZnCl info?~nat+o~t ('o~ttcrlt to l",val- 
tizzY+ >'cm~u~tic Szmilarity i~t a Taxonomy, Pz'occedings of 
1J(:A\[-95. 
Philip liesnik. 1!)!151) I)'lsamb~guating No~t~ (,'roul.ulgs 
w~th \]{espcct to I/VordNet 5'e'as+~s, Proceedings of Annual 
M(.cting of ACt,. 
lIinz'ich Schutze. 1993. Adva'~wes in Neural lTzforn~at~on 
tbocessing ,<'gste*ns 5, Stephen ,/. \[\[anson, Jack D. Cowan, 
(:.i,ce Files editors, Morgmt Kaufmmm, San Marco (?A. 
Michael SklSSll/t. 1993. Wo~d :'ensc k)'~sambiguation for 
t"rce-temt hzdccqnq Using a Massive Semantic Network, P\['o- 
cccdings of the S,,?(:on(\[ \[nterlmtional Conf~rcn(:e ,:)n \[nfornla- 
don and l(zzowl,.dge Managument (CIKM-93) 
Amos Twrsky 1977. \["aat+tres Of 5'imHa~ity. l'sychological 
t{-view, Vol. 84, Number 4. 
513 
