Clustering Co-occurrence Graph based on Transitivity 
K,,mlko TANAKA-Ishii 
Electrotechnical Laboratory 
1-1-4 Umezono, Tsukuba~ Ibaragi 305 JAPAN. 
klmiko~etl, go. jp 
Hideya IWASAKI 
Tokyo Univ. of Agriculture and Technology 
2-24-16 Naka-cho, Koganei, Tol~'o 184 JAPA.N. 
iwasaki@ipl, el. runt. ac. jp 
Abstract 
Word co-occurrences form a graph, regarding 
words as nodes and co-occurrence relations as 
branches. Thus, a co-occurrence graph can be 
constructed by co-occurrence relations in a cor- 
pus. This paper discusses a clustering method 
of the co-occurrence graph, the decomposition 
of the graph, from a graph-theoretical view- 
point. Since one of the applications for the 
clustering results is the ambiguity resolution, 
each output cluster is expected to have no am- 
biguity and be specialized in a single topic. We 
observed that a graph has no ambiguity if its 
branches representing co-occurrence relations 
are transitive. An algorithm to extract such 
graphs are proposed and its uniqueness of the 
output is discussed. The effectiveness of our 
method is examined by an experiment using 
co-occurrence graph obtained from a 30M bytes 
corpus. 
1 Introduction 
Clustering is the operation to group words by 
some criterion. Thesauri and synonym dictio- 
naries are some of its manual examples. Auto- 
matic outputs can be used not only to revise 
them, but also to aid ambiguity resolution, an 
essential problem in natural language process- 
ing. For instance, the me~ing of an ambiguous 
word can be decided by e.xamln'i~g the duster it 
belongs to. Furthermore, clusters grouped ac- 
cording to topics have many application areas 
such as automatic document classification. The 
input in this paper is the word co-occurrence 
graph obta~ued from corpus. The output is its 
subgraphs with the condition that each sub- 
graph is specialized in a topic. 
Many automatic clustering methods have 
been already proposed. Most of them are based 
on the statistical similarity between two words. 
Our approach is different; it is graph theoreti- 
cal. We tried to find out the special structure 
in linguistic graph. 
Having a huge co-occurrence graph obtained 
from a corpus, we first tried to decompose 
it to analyze its graph structure using graph 
theoretical tools, such as maximum strongly 
connected components, or biconnected compo- 
nents. Although both tools decompose a graph 
into tightly connected subgraphs, these trials 
resulted in vain. The question arose; what must 
be taken into account to decompose the co- 
occurrence graph. 7 The answer is the ambigu- 
ity. Furthermore, we reached to the conclusion 
that the ambiguity can be explained in terms 
of intransitivity. This feature is developed into 
an algorithm for clustering. 
This paper is organized as follows. The fol- 
lowing chapter describes the relationship be- 
tween the transitivity in the graph and the 
ambiguity resolution. Chapter 3 shows the re- 
lationships between clustering and transitivity. 
Chapter 4 proposes and discusses an algorithm 
for clustering. Related work is resumed in 
Chapter 5. Our method is examined in Chap- 
ter 6 by some experiments. 
2 Word Ambiguity and Transitivity 
Two words are said to co-occur when they fre- 
quently appear close to each other within texts. 
Regarding words as nodes and co-occurring re- 
91 
latious as branches, a graph can be constructed 
from a given corpus. We define such a graph 
as co-occurrence graph. 
When a portion of a corpus specializes in a 
topic, we can sti\]l extract a co-occurrence graph 
from the portion. A general corpus, such as 
newspaper corpus, contains many corpus por- 
tious, each specializing in one topic. Therefore, 
the whole co-occurrence graph obtained from 
a general corpus cont~.in.q subgraphs, each spe- 
cializing in one topic. Our question is to extract 
such subgraphs of topics from a co-occurrence 
graph. 
We denote V as the set of nodes (words), E 
as the set of branches (co-occurrence relations), 
G=< V, E ~ as an input graph and I1/"1 as the 
number of nodes. English words referred as 
examples will be w~itten in this font. 
2.1 TrAnsitivity in Co-occurrence Rela- 
tion 
The most basic mathematical laws discussed 
about relations between elements in a set are 
reflective, symmetric and transitive laws. Hav- 
ing a, b, c E V and R as a relation, they can be 
described as follows: 
Reflective aRa. 
Symmetric aRb ~ bRa. 
Transitive aRb, bRc ~ arc. 
Let V be word set and R be co-occurrence re- 
lation. When each property holds for .EL, words 
a, b and c can be explained as follows from the 
linguistic viewpoint: 
Reflective Word a co-occurs with itself. 
Symmetric Co-occurrence relation does not 
depend on the occurrence order. 
Transitive Word b does not have two-sided 
meanings (ambiguity). For instance, doc- 
tor, which has both medical and academic 
meanings, co-occurs with nurse within a 
medical topic, and co-occurs with professor 
within an academic topic. However, nurse 
and professor do not co-occur, so the tran- 
sitivity between nurse: doctor and professor 
does not hold. 
! 
I 
G @ a C 
b 
a c a c 
' db % - i | 
(5) d (6) e d 
c c c 
b bb 
(7) a d (8) d 
c @ c 
O. = ~..=mO 
~clmr branch duplicate brm-tch 
Figure 1: Graph Decomposition 
Our request is to extract subgraphs each of 
which focuses on one topic with no ambiguity. 
Therefore, we perform clustering by extracting 
subgraphs whose branches form transitive co- 
occurrence relations. 
3 TrAnsitivity and Clustering 
3.1 Decomposition and Duplication 
The simplest case is a graph of three nodes. 
Figure 1-(1) is a graph in which the transitivity 
does not hold. For example, when b is doctor, 
a is nurse and c is professor, nurse and professor 
do not co-occur due to the node b's two-sided 
meanings. Therefore, as in Figure 1-(2), we 
"duplicate" b so that each duplicated node cor- 
responds only to a single me~in,~. Then the 
ambiguity within b is resolved and the entire 
graph is divided into two subgraphs, the aca- 
demic one and the medical one. To sum up, 
when the transitivity does not hold, a graph 
can be decomposed by duplicating the ambigu- 
ous node. 
On the other hand, when the transitivity 
holds among three nodes (Figure 1-(3)), the 
92 
I 
graph cannot be decomposed by duplication of 
I b (Figure I-(4)). This can be explained that 2"~::'" T 
the graph does not have the ambiguity. (1) .=.~..q.g~ (2) m~.umm m * m~ttt:~m ~ 
We extend the above into the case of four ''" ~ - " 
i nodes(Figure 1-(5)). Here, transitivity does not • 
hold in a-b-d because there is no branch be- b b 
tween a-d. When b--c is duplicated, the graph ~ 
I can be decomposed into two subgraphs (Figure (3) a d (4)a d 
i-(6)) in which the transitivity holds On the 
contrary, Figure 1-(7) c~not be decomposed c c 
by duplicating b-c due to the branch a-d (Fig- 
ure 1-(8)); this shows that b-c is not ambigu- 
ous. Note that Figure 1-(7) is a complete graph 
of 4 nodes. We deYme duplicate branch as a 
branch to be duplicated for graph decomposi- 
tion (such as b-c) and anchor branch as a 
branch which i~hlbit graph decomposition by 
duplication (such as a-d). 
In general, when a graph could not be sep- 
arated by duplicating its subgraph, then the 
subgraph is regarded not to have ambiguity. 
Therefore, ideal clustering is to decompose 
graphs into subgraphs which cannot be decom- 
posed further by duplication. Unfortunately, 
this constraint is too strict because such a 
graph is restricted to a complete graph. In 
addition, extracting complete graphs withln a 
given graph is NP-complete. Therefore we dis- 
cuss in the following how to loosen the con- 
straint. 
3.2 Transitive Graph 
There are two methods to loosen the con- 
straint. 
The first is to decrease the nllmber of anchor 
branches. In the complete graph of more than 
5 nodes, several anchor branches exist for each 
duplicate branch (Figure 2-(1)). However, only 
one anchor branch is sufficient to inhibit the 
decomposition. The less the number of anchor 
branch is, the looser the constraint is. 
This intuitively corresponds to loosen the 
sharpness of the focus of the topic in the re- 
sulting cluster. For instance, two words pneu- 
monia and cancer do not always co-occur, but 
they do co-occur with words as doctor, nurse 
and hospital forming the core of medical topics. 
anchor distance 2 anchor distance n 
Figure 2: Loose Constraint for Graph Decom- 
position 
Pneumonia will be included into a cluster if it 
is connected with these three words even if it 
is not connected with cancer. If cancer is also 
connected with these three words, both cancer 
and pneumonia, the different subtopical words 
within a medical topic are included in a cluster. 
The second is to loosen the transitivity itself. 
It was defined in Section 2.1 within three nodes. 
We may prepare a loose transitivity as follows: 
VlJ~:~21 -'. , ~n_l~:~Vn ---> ~l_~n 
We define anchor distance as the maximum 
distance of the minimum distances of a--b-d and 
a-c--d. For example, when minimum distance 
of a-b-d is 4 and that of a-c-d is 6 then the 
anchor distance is 6. The tightest constraint 
is when anchor distance is 2 as in Figure 2- 
(3). This also blurs the topic focus of a cluster. 
In the example of pneumonia, the word will be 
included if it is connected directly with at least 
one of the words among doctor, hospital, nurse, 
and cancer, and connected indirectly with the 
others. 
For m,n < IVI- 1, G is called (m,n)- 
transitive graph when 
for all e E E, there ave m anchor 
branches e t E E of anchor distance 
_<n. 
(m, n)-transitive graphs can be extracted as 
the subgraphs of the input graph. Figure 3 
shows a map of (m,n)-transitive graphs. The 
axis of ordinates describes the number of an- 
chor branches (m). The axis of abscissas de- 
93 
.° 
tlght constraint = > Ioese 
loose 
2- 
fight 3- 
4- 
• IVI.I - 
2 4 I -1 
! o 
-~GS1 ....... G52 
$ 
i 
---~SO.....¢om pier e graph 
number of anchor branches m 
Figure 3: (m, n)-Transitive Graph 
scribes the anchor distance (n). The constraint 
is loose when m is small and n is large. For the 
same input, (ml,n)-transitive graphs are in- 
cluded in (m2, n)-transitive graphs when ml < 
m2, and (m, nl)-transitive graphs are included 
in (m, n2)-transitive graphs when n2 < hi. 
GS2 in Figure 3 is the clusters obtained un- 
der the loosest constraint: m is the maximum 
and n is the ~ni~um. In GS2, all ambiguity 
of a branch and nodes at its ends are resolved. 
G$O are the transitive graphs of the tightest 
constraint. All transitive graphs on the hori- 
zontal line including G$O are complete. 
4 Extraction of Transitive Graphs 
So far, we did not explain how to detect the 
duplicate and anchor branches, given a graph. 
An algorithm for clustering can be top-down 
or bottom-up. The former gives clusters by de- 
composing the input graph by detecting dupli- 
cate branches. 
Although we have explained our clustering 
method top-down up to now, we propose our 
clustering method as bottom-up. We obtain 
clusters by accumulating adjacent nodes so 
that every branch has anchor branches and the 
resulting clusters include no duplicate branch. 
Thus, in the bottom-up method, we need no~ 
detect duplicate branches. This is convenient, 
because the condition for anchor and dupli- 
cate branches is denoted by local relationships 
among nodes. 
The branches in the input graph are assumed 
to be all symmetric. In this section, we use 
terms clusters as our output and subgraphs 
as their candidates. 
4.1 Definition of Clusters 
We extract GS1, the (1, 2)-transitive graph. 
A subgraph A including a branch e in the 
input graph can be extracted as follows: 
Step 1. Put a triangle graph including e into 
A. 
Step 2. Take a branch e ~ in A and a node v 
which makes a triangle with e t (Figure 4). 
If the following condition is satisfied, put 
v into A. 
There exists a node v t E G (in- 
put graph) whose distance from 
e ~ is 1, and it is connected to v 
with a branch. Here, the branch 
J-~v is the anchor branch so that 
e t is hindered to be the duplicate 
branch in the resulting cluster. 
Additionally, put every branch between 
v" E G and v into A. 
Step 3. Repeat Step 2 until A c~nnot be ex- 
tended. 
Performing the above procedure starting from 
every branch in the input graph, we obtain 
many subgraphs. Considering the inclusion 
relation between subgraphs, they constitute a 
partial order (Figure 5). We define clusters as 
maximal subgraphs in this partial order chain. 
They are subgraphs not included in any other 
subgraphs. The uniqueness of the clusters for 
an input is self-evident. 
4.2 Algorithm for Clustering 
In the previous section, the procedure to obtain 
subgraphs should begin from every branch in 
the input. However, it is su~cient to calculate 
as follows. 
Step O. i -- 0 
I 
I 
il 
i 
I 
I 
l 
I 
I 
! 
I 
I 
I 
a 
I 
I 
94 
Y 
Figure 4: Extraction of (1,2)-Transitive Graph ~ 
D cl/~~ste~ 
Figure 5: Subgrapl~ and their Partial Order 
Step 1. Choose a branch e E G not included 
in Go,"', Gi-1. If no e is found, go to 
Step 5. Gi --- < 0,0 >. Put a triangle 
graph including e into Gi. 
Step 2 and 3. Extend Gi using Step 2 and 3 
of the previous section 
Step 4. Set i - i q- 1 and goto Step 1. 
Step 5. Examine every pair of subgraphs (A, 
B), andifA includes B, then drop B. The 
remaining subgraphs are defined as clus- 
ters. 
A maximal subgraph c~ot be missed. Its 
starting branch is encountered without fail in 
the above algorithm. If it is encountered as the 
starting branch in Step 1, the maximal sub- 
graph is obtained. If it is captured into a sub- 
graph and becomes e ~ in Step 2, the subgraph 
extends to the size of maximal subgraph; if 
it gets larger, the subgraph contradicts being 
maximal as the result of the last section. 
The algorithm halts since the input graph is 
finite, and the output is unique for an input. 
5 Related Work 
\[Li and Abe, 1996\] compared clustering meth- 
ods proposed so far \[Hindle, 1990\] 
\[Brown et ¢1., 1992\] \[Pereira et al., 1993\] 
\[Tokunaga et el., 1995\]\[Li and Abe, 199@ 
Most of them are so-called hard ch~tering: 
each word is included only in one cluster. We 
do not follow the trend, from the sense that our 
objective is the extraction of clusters of topics. 
It is natural that an ambiguous word should be 
included in different clusters. 
\[Pereira et ¢l., 1993\] adopts sof~ clustering. 
They measured co-occurrence between nouns 
and verbs, and clustered nouns of the same dis- 
tribution of verbs. 
\[Fukumoto and Tsujii, 1994\]'s 
work has common motivation with us: the am- 
biguity should be resolved for clustering. They 
clustered verbs using the gravity of multivari- 
ate analysis. 
\[Sugihara, 1995\]'s approach has a common 
point in that it focuses on graph structure for 
clustering and tries to structurize the input 
graph, a bipartite graph of words and concepts 
(such as food, fruit etc.). His clustering method 
is so called Dlllrnage-Mendelsohn decomposi- 
tion in graph theory. The output naturally 
gives a partial order of clusters which can be 
compared with conventional thesauri. 
Our input is not bipartite. In the begin- 
ning, we tried to decompose input graph into 
maximum strongly connected components to 
obtain graphs of topics from the observation 
that nodes in a cycles are strongly related 1. 
However, subgraphs about different topics is 
merged into the same cluster by two ambiguous 
words which bridge these two subgraphs(Figure 
6). Next, we observed that articulation nodes 
are ambiguous, so we performed decomposi- 
tion into biconnected components. In this case 
when several biconnected components are con- 
nected in a ring, articulation nodes could not 
be detected (Figure 7). The observation that 
there are no co-occurrence relationship between 
X\[Tokunagaaud Tanaka, 1990\] discusses on extrac- 
tion of cycles formed by trauslation relations fzom bilin- 
gual dictionary. 
95 
Figure 6: Problematic ~Structure in MSCC 
Clustering   
mblguous word 
clusters of dffferem toph:s 
Figure 7: Problematic Structure in Bicon- 
nected Component Clustering 
two biconnected graph across the articulation 
node was the start point of this paper. 
6 Experiments 
6.1 Procedure of Clustering 
First, we make the input graph from a 30M 
bytes of Wall Street Journal. Co-occurrences 
of no~s and verbs are extracted by a morpho- 
logical analyzer. We defined that a word co- 
occurs with 5 words ahead of the word within a 
sentence. Co-occurrence degree is measured by 
mutual information\[Church and Hanks, 1990\]. 
We set a certain threshold to the values to ex- 
tract the input graph. 
The number of resulting clusters depends on 
the input graph as follows: 
1 One huge connected subgraph. 
2 Several medium sized subgraphs. 
3 Many small sized subgraphs. 
When the threshold value is too high, the out- 
put is 3. On the contrary, when it is too low, 
then the output becomes 1. Both 1 and 3 are 
I 
I 
Number of dusters of size more than 7 
40 i 
30 I 
20 I 
10 
1 r t-,I, .Thresh°Id ~1 
1 2s4ss~ 
Figure 8: Threshold and the Output 
not interesting, because 1 is a graph including 
all topics, and 3 generates graphs of too small 
topics to check the global trend of topics in the 
input. Therefore, we varied threshold from 1.0 
up to 7.0 by 0.5 steps to make the input graph, 
applied our algorithm to each input in order to 
detect the best threshold. 
The result is shown in Figure 8. The num- 
ber of clusters whose sizes are more than 7 is 
plotted against the threshold value. When the 
threshold is 3.5, such dusters were most nu- 
merous. In 39 dusters, there were 727 different 
words out of 15347 in the input graph. 
6.2 Evaluation of Clusters 
In Appendix, 39 clusters are shown their con- 
tents sorted by size. Words judged inappro- 
priate in each cluster are attached ~t'- Words 
tmdecidable being suitable in their clusters are 
put "~". 
All 39 clusters are attached four items as fol- 
lows: 
• Subjectively judged topic 
• Cluster size (CS): number of words in a 
cluster 
• Error rate (ER): rate of inappropriate 
words (attached "t') 
• Uncertainty rate (UR): rate of uncertain 
words (attached "~" and "~") 
The average of the above items were CS=20, 
96 
ER=10.3%, UP,.= 14.7%; hence, the ambiguity 
was removed from clusters up to 85% on aver- 
age. The number of the cluster whose topic was 
inestimable is only 1. The estimation of topic 
becomes clltTicult with two factors, CS and UR. 
When CS is too small, even when UR is 0.0, the 
cluster itself lacks in information. When UR is 
high, it is natural that the topic becomes ines- 
timable. 
6.3 Evaluation of Words Contained in 
More thun Two Clusters 
The number of words belonging to more than 
two clusters amounts to 57. They are classified 
as follows (numbers in parenthesis are cluster 
number in Appendix): 
i. A word with different 
words). 
men-lugs (10 
l cell (1, 15)I ice (3,8) I panel (1, 7) treat (1, 3) 
2. A word with the same me~nlng but used 
in different contexts (32 words). 
star (9,12,14) brand (3,22) 
3. A word with the same meaning in the same 
context (7 words). 
4. Others (One of the words is uncertain, or 
its cluster's context is not estimated. 6 
words) 
Words of class 1 is the ambiguous words. Cell 
in Cluster I means cells of tissue, whereas that 
in Cluster 15 means battery. Ice in Cluster 3 
means ice for cooling beverage, whereas that in 
Cluster 8 means ice to skate on. 
According to our objectives to obtain sub- 
graphs of topics, words in class 2 is quite im- 
portant to be duplicated. For instance, star in 
Cluster 9 is a sport player star, that in Clus- 
ter 12 is a singer star and that in Cluster 14 
is a movie star. If star were not duplicated, 
the three different topics would be merged into 
a single subgraph. The same situation is ob- 
served for children: it would merge topics of 
childbirth and education into a graph if it was 
not duplicated. We are apt to pay attention 
only to the words of class 1, but that of class 2 
plays an important role in clustering. 
Words in class 3 is not ambiguous: they 
should connect two subgraphs into one (see 
Section7). 
6.4 Cluster Hierarchy 
An output subgraph of higher threshold is in- 
cluded as that of lower threshold. With this 
inclusion relation, the clusters form a hierar- 
chy(Figure 9). A part of the hierarchy is shown 
below: 
Threshold 3.75 
A admission college scholarship 
\]3 admission applicant college 
C campus children classroom college edu- 
cation enroll faculty grade math parent 
scholarship school student sugar teach 
teacher tuition tutor university voucher 
D birth child children couple marriage 
marry parent wedlock woman 
E birth infant weight 
F birth boy marry 
Threshold 3.5 Cluster 6,24 in Appendix. 
Threshold 3.25 
G admission applicant baby birth boy 
campus century child children class- 
room college couple daughter educa- 
tion endowment enroll enrollment es- 
tablishment faculty father gift girl god 
grade homework husband infant ivy kid 
live love man marriage marry math 
mother oxford parent professor psychol- 
ogist scholar scholarship school son stu- 
dent study sugar taught teach teacher 
teaching toy tuition tutor university 
voucher wed wedlock woman 
At threshold 3.75, the origins of educa- 
tion (Cluster 6) and childbirth(Cluster 24) 
clusters are already formed. Among educa- 
tion, there are subtopics on scholarship school 
and school entrance. They are merged into 
Cluster 6 when the threshold is lowered to 3.5. 
Cluster 24 is also formed from Clusters D,E,F 
of threshold 3.75. Then Cluster 6 and 24 are 
merged into Cluster G when the threshold is 
97 
Threshold 3.25 
3.5 
3.75 
cA 
A B C 
Figure 9: Cluster Hierarchy 
lowered again to 3.25. The clusters' hierarchi- 
cal relationships are shown in Figure 9. 
We may see that the topic is more specialized 
when the threshold is high. Clusters which are 
merged between threshold 3.75 and 3.5 were 
those within a topic (A,B,C or D,E,F), but 
topics of different clusters are merged at 3.25 
(Clusters 6 and 24). Thus, the lower the thresh- 
old is, the more the cluster contains ambiguity. 
The reason is that the two words in different 
topics do not co-occur. 
7 Discussion 
The best threshold differs in topics. Some ex- 
amples are: 
Economic topic: Although Wall Street Jour- 
nal articles have economic tendency, clus- 
ters of economic topic cannot be found in 
the dusters with more than ? words of 
threshold 3.5. They appear in clusters at 
threshold 3.0 as follows: 
• accountant audit bracket deduction 
filer income offset tax taxpayer 
• convert conversion debenture debt 
holder out.stand prefer redeem redemp- 
tion repay tidewater 
The threshold should be lower for this 
topic. 
Medical topic: Cluster 1 have too many 
words. Despite of a medical cluster, potato 
appears in the cluster. At the thresh- 
old 3.0, Cluster 3 is completely merged 
with Cluster 1. The appearance of potato 
shows the sign that the merge of two clif- 
ferent topic has already begun. Therefore, 
a higher threshold is preferred. 
98 
Trial topic: Several clusters exist on trial in 
Appendix. They should form a cluster 
with relatively lower threshold. 
Consequently, one of the most important future 
work is to integrate two stages, the first stage of 
malc;ng input graph with the static threshold, 
and the second stage of clustering, into a single 
stage with dynamic threshold. 
8 Conclusion 
We have discussed a method to duster a co- 
occurrence graph obtained from a corpus, from 
a graph-theoretical viewpoint. A graph has no 
ambiguity if its branches, co-occurrence rela- 
tions are transitive. This graph theoretical ap- 
proach using graph is characteristic and it dif- 
fers from the conventional clustering method. 
We proposed an algorithm to extract subgraphs 
whose branches are transitive co-occurrence re- 
lations and discussed its features. The effec- 
tiveness of our method was examined using the 
30M corpus. 

References 
\[Li and Abe, 1996\] H. Li and N. Abe. Clus- 
tering Words with the MDL Principle. 
In Proceedings of the 15th International 
Confervnce on Computational Linguistics, 
roll. pp.4-10, 1996. 
\[Sugihara, 1995\] K. Sugihara. A Graph-The- 
oretical Method For Monitoring Concept 
Formation. Pattern Recognition, Vol.28, 
No.ll, pp. 1635--1643, 1995. 
\[Tokunaga et al., 1995\] 
T. Tokunaga, M. Iwayama and H. Tanaka. 
Automatic Thesaurus Construction based 
on Grammatical Relations. In Proceedings 
of the International Joint Conference on 
Artificial Intelligence '95, 1995. 
\[Fukmnoto and Tsujii, 1994\] F. Fu.kumoto and 
J. Tsujii. Automatic Recognition of Verbal 
Polysemy. In Proceedings of the igth In- 
ternational Conference on Computational 
Linguistics , vol.2, pp.762-768, 1994. 
\[Pereira et al., 1993\] F. Pereira, N. Tishby and 
L. Lee. Distributional Clustering of En- 
glish Words. In Proceedings of t~e Mst 
ACt, pp. 183-190, 1993. 
\[Brown et aL, 1992\] P. Brown, V. Pietra, et 
al. Class-based n-gram Models of Natural 
Language. Computational Lingui.~ics, 18 
(4), pp. 283-298, 1992. 
\[TokLmaga and Tanaka~ 1990\] Tokunaga, T. 
and Tauaka, H. The Automatic Extrac- 
tion of Conceptual Items from Bilingual 
Dictionaries. PRICAI, 1990. 
~Rindle, 1990\] D. Hindle. Noun Classification 
from Predicate -- Argument Structures. 
In Procceed~gs of fAe ~8~h ACL, pp.168- 
175, 1990. 
\[Church and Hanks, 1990\] K. W. Church and 
P. Hanks. Word Association Norms, Mu- 
tual Information, and Lexicography. Com- 
putatior~al L~guistics, vol. 16 (1), pp. 22- 
29, 1990. 
