Word Sense i)ismnl}iguati<)n and Text Set mentation 
Bas(q on I,c×ical (+ohcslOll 
()KUMUI{A Manabu, IIONI)A Takeo 
School of \[nforma,tion Science, 
Japan Advanced Institute of Science a.nd Technology 
('l'al.sunokuchi, lshikawa 923-12 Japan) 
c-nmil: { oku,honda}¢~jaist.ac.ji~ 
Abstract 
In this paper, we describe ihow word sense am= 
biguity can be resolw'.d with the aid of lexical eo- 
hesion. By checking \]exical coheshm between the 
current word and lexical chains in the order of 
the salience, in tandem with getmration of lexica\] 
chains~ we realize incretnental word sense disam 
biguation based on contextual infl)rmation that 
lexical chains,reveah Next;, we <le~<:ribe how set 
men< boundaries of a text can be determined with 
the aid of lexical cohesion. Wc can measure the 
plausibility of each point in the text as a segment 
boundary by computing a degree of agreement of 
the start and end points of lexical chaihs. 
1 Introduction 
A text is not a mere set of unrelated sentences. 
Rather, sentences in a text are about the same 
thing and connected to each other\[l()\]. Cohesion 
and cohere'nee are said to contribute to such con- 
nection of the sentences. While coherence is a 
semantic relationship and needs computationally 
expensive processing for identification, cohesion 
is a surface relationship among words iu a text 
and more accessible than coherence. Cohesion 
is roughly classitled into reference t, co'r@tnction, 
and lezical coh, esion 2. 
Except conjmwtion that explicitly indicates l;he 
relationship between sentences, l;he other two 
<:lasses are considered to t>e similar in that the re- 
lationship hetweer~ sentences is in<licated by two 
semantically same(or related) words. But lexical 
1Reference by pronouns and ellipsis in Halliday and 
Hasan's classification\[3\] are included here. 
2Reference by flfll NPs, substitution mtd lcxical cohe-. 
sion in Ilalllday and Hasan's classillcation a.re included 
here. 
cohesion is far easier to idenlAfy than reference be- 
cause 1)oth words in lexical cohesion relation ap- 
pear in a text while one word in reference relation 
is a pr<mom, or elided and has less information to 
infer the other word in the relation automatically. 
Based on this observation, we use lexical cohe- 
sion as a linguistic device for discourse analysis. 
We call a sequence of words which are in lexieal 
cohesion relation with each other a Icxical chain 
like \[10\]. l,exical chains tend to indicate portions 
of a text; that form a semantic uttit. And so vari.- 
ous lexical chains tend to appear in a text corre. 
spou(ling to the change of the topic. Therefore, 
I. lexical chains provide a local context to aid 
in the resolution of word sense ambiguity; 
2. lexical <'hains provide a <'lue for the determi- 
nation of segnlent boundaries of the text\[10\]. 
\]n this paper, we first describe how word sense 
ambiguity can t)e resolved with the aid of lexical 
cohesion. During the process of generating lex- 
i<'al chains incrementally, they are recorded in a 
register in the order of the salience. The salie'ncc 
of lexical chains is based on their recency and 
length. Since the more salient lexical chain rep 
resents the nearby local context, by checking lexi: 
ca\[ cohesion between the current word and lexieal 
chains in the order of tile salience, in tandem with 
generatiou of lexical chains, we realize incremen. 
tal word sense disambiguation based on contex- 
tual information that lexical chains reveal. 
Next;, we describe how segment boundaries of 
a text can be determined with the aid of lexical 
cohesion. Since the start and end points of lexical 
chains it, the text tend to indicate the start and 
end points of the segment, we can measure the 
plausibility o\[' each point in the text as a segment 
boundary by computing a degree of agreement of 
the sta.rt and end points of lexical chains. 
755 
Morris and Itirst\[10\] pointed out the above 
two importance of lexical cohesion for discourse 
analysis and presented a way of computing 
lexical chains by using Roger's International 
Thesaurus\[15\]. IIowever, in spite of their mention 
to the importance, they did not present the way 
of word sense disambiguation based on lexical co- 
hesion and they only showed the correspondences 
between lexical chains and segment boundaries by 
their intuitive analysis. 
McRoy's work\[8\] can be considered as the one 
that uses the information of lexical cohesion for 
word sense disambiguation, but her method does 
not; take into account the necessity to arrange 
lexical chains dynamically. Moreover, her word 
sense disambignation method based on lexical co- 
hesion is not evaluated fully. 
In section two we outline what lexical cohe- 
sion is. In section three we explain the way of 
incremental generation of lexical chains in tan- 
dem with word sense disambiguation and describe 
the result of the evaluation of our disambiguation 
method. In section four we explain the measure 
of the plausibility of segment boundaries and de- 
scribe the result of the evaluation of our measure. 
2 Lexical Cohesion 
Consider the following example, which is the 
English translation of the fragment of one of 
Japanese texts that we use for the experiment 
later. 
In the universe that continues expancb 
ing, a number of stars have appeared 
aml disappeared again and again. And 
about ten billion years after tile birth 
of the universe, in the same way as 
the other stars, a primitive galaxy was 
formed with the primitive sun as the 
center. 
Words {nniverse, star, universe, star, galaxy, 
sun} seem to be semantically same or related to 
each other and they are included in the same cat- 
egory in Roget's International Thesaurus. Like 
Morris and tIirst, we compute such sequences of 
related words(lexical chains) by using a thesaurus 
as the knowledge base to take into account not 
only the repetition of the same word but the use 
of superordinates, subordinates, and synonyms. 
We. use a Japanese thesaurus 'Bnnrui- 
goihyo'\[1\]. Bunrui-goihyo has a similar organi- 
zation to Roger's: it consists of 798 categories 
and has a hierarchical structure above this level. 
For each word, a list of category numbers which 
corresponds to its multiple word senses is given. 
We count a sequence of words which are included 
in the same category as a lexical chain. It might 
be (:lear that this task is computationally trivial. 
Note that we regard only a sequence of words in 
the same category as a lexical chain, rather than 
using the complete Morris and Hirst's framework 
with five types of thesaural relations. 
The word sense of a word can be determined 
in its context. For example, in the context 
{universe, star, universe, star, galaxy, sun}, the 
word 'earth' has a 'planet' sense, not a 'ground' 
one. As clear from this example, lexical chains 
('an be used as a contextual aid to resolve word 
sense ambiguity\[10\]. In the generation process 
of lexical chains, by choosing the lexical chain 
that the current word is added to, its word sense 
is determined. Thus, we regard word sense dis- 
ambiguation as selecting the most likely category 
number of the thesaurus, as similar to \[16\]. 
l';arlier we proposed incremental disambigua- 
tion method that uses intrasentential informa- 
tion, such as selectional restrictions and case 
frames\[l 2\]. In the next section, we describe incre- 
mental disambiguation method that uses lexical 
chains as intersentential(contextual) information. 
3 Generation of Lexical Chains 
In the last section, we showed that lexical chains 
carl play a role of local context, t\]owever, multi- 
ple lexical chains might cooccur in portions of a 
text and they might vary in their plausibility as 
local context. For this reason, for lexical chains 
to function truly as local context, it is necessary 
to arrange them in the order of the salience that 
indicates the degree of tile plausibility. We base 
the salience on the following two factors: the re- 
cency and the length. The more recently updated 
chains are considered to be the more activated 
context in the neighborhood and are given more 
salience. The longer chains are considered to be 
more about the topic in the neighborhood and 
are given more salience. 
By checking lexical cohesion between the cu> 
rent word and lexical chains in the order of the 
salience, the lexical chain that is selected to add 
the current word determines its word sense and 
plays a role of local context. 
Based on this idea, incremental generation of 
756 
lexical chains realizes incremental word sense dis- 
ambiguation using contextual information that 
lexical chains reveal. During the generation 
of lexical chains, their salience is also in 
crementally updated. We think incremental 
disambiguation\[9\] is a better strategy, because 
a combinatorial explosion of the number of to 
tal ambiguities rnight occur if ambiguity in not 
resolved as early as possible during the analyt 
ical process. Moreover, incremental word sense 
disarnbiguation is indist)ensable during the gem 
eration of lexical chains if lexical chains are used 
for incremental analysis, because tile word sense 
ambiguity might cause many undesirable lexical 
chains and they might degrade the performance, 
of the analysis(in this case, the disambignation 
itself). 
3.1 The Algorithm 
First of all, a &~pauese text is automatically seg-- 
mented into a sequence of words 1)y the morpho- 
logical analysis\[l 1\]. Ih-om tile result of the |nor- 
phological analysis, candidate words are selected 
to inch.lde in lexical chains. We consider only 
nouns, verbs, and adjectives, with sonte excep 
lions such as nouns in adverbial use and verbs in 
postpositional use. 
Next lexical chains are formed. Lexical cohe- 
sion among candidate words inside a sentence is 
first; checked by using the thesaurus. Ilere the 
word sense of the current w/)rd might be deter- 
mined. This preference for lexica.1 cohesion inside 
a sentence over the intersentential one retlects our 
observation that the former nfight be tighter. 
After the analysis htside a sentence, i:audidate 
words are tried to be added to one of the lexi- 
eal chains that are recorded in the register in the 
order of the above salience. The ih'st chain that 
the current word has tile lexica\] cohesion relation 
is selected. The salience of the selected lexical 
chain gets higher and then the arrangement in 
the register is updated. 
Here not ()lily the word sense amt)iguity of the 
current word is resolved but the word sense of the 
amt)iguous words in the selected \]exica\[ chain cau 
also be determined. Because the lexical chain gets 
higher salience, other word senses of the mnhigu 
ous words in the lexic~d chain whi/-h correspond 
to other lexical chains can he rejected. There 
fore,, lcxica\] chains can be used riot only a.s prior 
context but, also later context for word seuse dis- 
ambiguation. 
If a candidate word can not be added to the 
existing lexical chain, new lexieal chains for each 
word sense are recorded in the register. 
As clear fl'om tile algorithm, rather than the 
truly incremental method where the register of 
lexical chains is updated word by word in a sen- 
tenee, we adopt the incremental method where 
updates are performed at the end of each sentence 
because we regard intrasentential information as 
more iml)ortant. 
The process of word sense disambiguation us- 
ing lexical chains is illustrated in Figure 1. The 
most salient lexical chain is located at the top in 
the register. In the initial state the word W1 re-- 
utains aml)iguous. When tile current unambigu- 
ous word W2 is added, tile chain b is selected(top 
left). The chain b t)ecomes the most salient(top- 
right). Ilere the word sense ambiguity of the word 
W\[ in the chain b is resolved(bottom-left). If the 
word to be added is ambiguous(W3), tile word 
sense corresponding to the more salient \]exieal 
chaln(1D21) in seh;eted(l)ottom-right). 
3.2 The Ewfluation 
Wc apply Lhe algodthn~ to five texts. Tal)le l 
shows the system's performance. 
The 'correctness' of the disambiguation is 
judge, d by one of the authors. The system's per- 
formance is con|tinted as the quotient of the num 
ber of correctly disambiguated words by the num 
her of ambiguous words miuus the nmnber of 
wrongly segmented words(morphological attalysis 
ergo rs) 3, 
Words that relnaill ambiguous are those that 
(1o llOt \['orin any lexical chains with other words. 
F, xcept t)y the errors in the ntorphologieal analy- 
sis, most of the errors in the disambiguation are 
caused by being dragged into the wrong context. 
The average performance is 63.4 %. We think 
the system's l?erformam:e is promising for the fol- 
lowing reasous: 
I. l,exical cohesion is not the only knowledge 
sour('e lbr word sense disatnbiguation and 
\[)roves to be usefill at least as a source sup- 
plernentary to our earlier framework that 
used cane frmnes\[12\]. 
2. In fact, higher performance is reported in 
\[16\], thai; uses bro~der context acquired by 
at, I lie accuramy ot' the inorphological analysis will be im- 
l)r(wed by adding new word entries or the like. 
757 
{W2, \[ID2\]) / 
~'(~,p ~n b wlDmq 
~.hain c WI\[ID,2! 
(mind ) 
) 
) 
) 
~ ist of a/~oiguous words 
k (Wl,{ID11,II\]121) (...,\[...,...\]) ........ 
min b W2\[ID2\] W1\[ID11\] ) 
ain a D 
<chain c ) 
~aind ) 
) 
.,::i .4!:!:i: x:i:!$>" 
f::+-. 
i:i::.:..... 
(,,,*,, b w~I,o2j w? 
~eino ) 
.... !:!:i:?::, ~-----~ (Wl,\[ID11,ID12\]) (...,\[...,.,.\]) ........ 
(W3, \[ID3\]., ID32 \] ) 
/ 
/ 
r 
ain a ,,, ) 
c~in cl ) 
) 
~J.st of mrbiguous words ~zst of ambiguous words 
(...\[:..,..,\]) ........ ) ) ~,~ (.:.\[...,...1) ........ 
Figure 1: The process of word sense disambiguation 
number of 
candidate 
words 
number of 
ambiguous 
words 
text number of 
sentences 
No.1 41 481 166 
No.2 26 197 71. 
No.3 24 212 57 
No.4 38 433 123 
No.5 24 163 82 
number of 
words that 
remain 
ambiguous 
13 
12 
19 
11 
number of 
correctly 
disambiguated 
words 
126 
32 
34 
71 
42 
system's 
performance (%) 
87.5 
51.6 
64.2 
60.1 
53.8 
Table 1: The performance for the disambigm~tion 
758 
training on large corl)ora , but. our method 
can attain such tolerab\[e level of performance 
without any training. 
However, our salience of lexical chains is, of 
course, rather naive and must be refined by us- 
ing other kinds of inibrmation, such as Japanese 
topicM marker 'wa'. 
chains 
start~end 
( i - 24) 
( 4 - 13) 
(14 - is) 
( 8 - 9) 
(14 - 18) 
text 
i 2 
123456789012345678901234 
4 Text Segmentation by Lexi- 
cal Chains 
The second importance of lexic~d chains is that 
they provide a clue for the deternfination of seg- 
ment boundaries. (Jertain spans of sentences in 
a text form selnantic units and are usually called 
segments. It is crucial to identify the segment 
boundaries as a first step to construct the struc- 
ture of a text\[2\]. 
4.1 The Measure for Segnmnt tioun<l- 
aries 
When a portion of a text forms a semantic unit, 
there is a tendency for related words to be used. 
Therefore, if lexical chains can be found, they 
will ten(t to indicate the segment boundaries of 
the text. When a l.exical chain ends, there is a 
tendency for a segment to end. \[f a, llew chain 
begins, this might be an indication thai; a new 
segment has begun\[l 0\]. Taking into account tiffs 
correspondence of \[exieal chain boundaries to seg- 
ment boundaries, we measure the plausibilit;y el 
each point; in the text; as ~ segment hotmdary: tbr 
each point between sentences n an(l 'n k I (where 
it ranges fl'om 1 to the m|nlt)er el' sentences in the 
text minus 1), compute the stun of the numl)er 
of lexical chains that en(l at the sentence ?z and 
the number of lexical chains that begin at the 
sentence n + 1. We call this naive measure of a 
degree of agreement of the start and end points of 
lexicM chains w(n,n + l) boundary strength like 
\[14\]. The points ill the text are selected in the 
order of boundary strength as candidates of seg- 
ment boundaries. 
Consider for example the live lexieal chains in 
the imaginary text that consists of 24 sentences in 
Figure 2. In this text, the boundary strength can 
be computed as follows: w(a,4) = 1,,.,(7,s) - 
1,w(9,10) ~- 1,w(13,14) -- 3,.... 
Figure 2: l,exieal chains in the text 
4.2 The Evahmtion 
We, try to segnient the texts ill section 3.2 
and apply the above measure to the lexical 
chains that were tbluned. We pick out three 
texts(No.3,4,5), which are fi:om the exam ques 
tions of the Japanese language, that ask us to par- 
tiglon the texts into a given number of segments. 
The system's performmwe is judged by the com. 
p~rison with segment boundaries marked as an 
attaehe(l model answer. Two more texts(No.6,7) 
\['rom the questions are also tried to be segtnented. 
Here we do not t:M~e into account the intbrma 
tion of paragraph lmundaries, such as the inden 
ration, at all in the following rea,sons: 
• \]{ceallse OllF texts aFe h'oin the exam ques 
tions, nla, ny ()f them have no I\]Tta, rks of para- 
graph I)oundaries; 
• ill? case of ,laps.nose, it is pointed out that 
paragraph and segment boundaries do not 
always coincide with each other\[l 3\]. 
Table 2 shows the t)crformanee in case where 
the system generates the given number of segment 
botm(laries 4 in the order el" the strength. From 
Table 2, we can compute the system's marks as 
an exanlinee in tim Lest that consists of these five 
quesLiolm. Tal-)le 3 shows the performance in case 
where segment boundm:ies are generated down to 
half of the maximum strength. 'l'he metrics that 
we. use for the ewduation are as follows: Recall is 
the quotient of' the in|tuber of correctly identified 
boundaries by the total mmlber of correct bound 
aries. Precision is the quotient of the nmnber of 
(:orre(:t\[y identifie(l I)ounda, ries by the tnllnl)er of 
generated boundaries. 
We think the poor result for the text No.5 
might be caused by the difficulty of tile text 
~The number of boundaries to be given is the mtmber 
of segments given in the question minus 1. 
759 
text 
No.3 
No.4 
No.5 
No.6 
No.7 
given number of 
boundaries 
number of 
correct boundaries 
1 1 
6 3 
1 0 
4 
3 1 
Table 2: The performance for the segmenta- 
tion(l) 
text 
I__ 
No.3 
No.4 
No.5 
No.6 
No.7 
number of 
generated 
boundaries 
number of 
correct 
boundaries 
3 1 
10 3 
7 3 
5 1 
rec. prec. I 
1 0.3 H 0.--T-~0~ 
-% o 1 
o.7---~ o.4~\] 
_ 0.aK 0.20 A 
Table 3: The performance for the segmenta- 
tion(2) 
itself because it is written by one of the 
most difficult writers in Japan, KOBAYASH\[ 
Hideo. Table 2 shows that our system gets 
8(1+3+3+1)/15(1+6+1+4F3)= 53 % in the 
test. From Table 3, the average recall and pre- 
cision rates are 0.52 and 0.25 respectively. Of 
course these results are unsatisfactory, but we 
think this measure for segment boundaries is 
promising and useful as a preliminary one. 
Since lexical chains are considered to be dif- 
ferent in their degree of contribution to segment 
boundaries, we arc now refining the measure by 
taking into account their importance. We base 
the importance of lexical chains on the following 
two factors: 
1. The lexical chains that include more words 
with topical marker 'wa' get more impor- 
tance. 
2. The longer lexical chains tend to represent a 
semantic unit and get more importance. 
The start and end points of the more impor- 
tant lexical chains can get the more boundary 
strength. This refinement of the measure is in 
the process and yields a certain extent of improve- 
ment of the system's performance. 
Moreover, this ewduation method is not nec- 
essarily adequate since partitioning into a larger 
number of smaller segments might be possible 
and be necessary for the given texts. And so we 
will have to consider the evaluation method that 
the agreement with hmnan subjects is tested in 
future. Ilowever, since human subjects do not al- 
ways agree with each other on segmentation\[6, 4, 
14\], our evaluation method using the texts in the 
questions with model answers is considered to be 
a good simplification. 
Several other methods to text segmentation 
have been proposed. Kozima\[7\] and Youmans\[17\] 
proposed statistical measures(they are named 
LCP and VMP respectively), which indicate the 
plausibility of text points as a segment bound- 
ary. Their hills or valleys tend to indicate seg- 
ment boundaries. However, they only showed the 
correlation between their measures and segment 
boundaries by their intuil, ive analysis of few sam- 
ple texts, and so we cannot compare our system's 
and their performance precisely. 
ltearst\[5\] independently proposes a similar 
measure for text segmentation and evaluates the 
performance o\[ her method with precision and re- 
call rates. However, her segmentation method 
depends heavily on the information of paragraph 
boundaries and always partitions a text at the 
points of paragraph boundaries. 
5 Conclusion 
We showed that lexical cohesion can be used as a 
knowledge source for word sense disambiguation 
and text segrnentatinn. We think our method is 
promising, although only partially successful re- 
sults can be obtained in the experiments so far. 
Here we reported some preliminary positive re- 
sults and made some suggestions for how to im- 
prove the method in future. The improvement of 
the method is now under way. 
In addition, because computation of lexical 
chains depends completely on the thesaurus used, 
we think the comparison among the results by 
different thesauri would be insightful and are now 
planning. \[t it also necessary to incorporate other 
textual information, such as clue words, which 
can be computationally accessible to improve the 
performance. 
760 
References 
\[1\] Bunrui-Goihyo. Shuei Shuppan., :1964. in 
Japanese. 
\[2\] B.J. Grosz and C.L. Sidner. Attention, 
intentions, and the structure of discourse. 
Coraputationol Li~iguistics, 12(3):175 204, 
1986. 
\[3\] It. A. K. ftalliday and R. Hassan. Cohesion 
in English. Longman, 1976. 
\[4\] M.A. ltearst. Texttiling: A quantitative ap- 
proach to discourse segmentation. Techni- 
cal Report 93/24, University of California, 
Berkeley, 1993. 
\[5\] M.A. Hearst. Multi-paragraph segmentw 
tion of expository texts. Technical Report 
94/790, Uniw~rsity of California, Berkeley, 
1994:. 
\[6\] J. ttirschberg and B. C, rosz. lnt:onational fea- 
tures of local and global discourse structure. 
In Proc. of the Darpa Workshop on Speech 
and Natu~vd Language, pages ,141- 446, 1992. 
\[7\] H. Kozima. Text segmentation based on sim- 
ilarity between words. In Proc. of the 31st 
A nn.lLal Meeting of the Association for Com- 
putational Linguistics, pages 286 288, 1993. 
\[8\] S.W. Mcll~oy. Using multiple knowledge 
sources for word sense discrimination. Com- 
putational Linguistics , 18(1):1 30, 1992. 
\[9\] C.S. Mellish. Computer Interpretation of 
Natural Language Descriptions. Ellis Hor-- 
wood, 1985. 
\[10\] J. Morris and G. tlirst. Lexical cohesion 
computed by thesaural relations as an indi- 
cator of the structure of text. Computational 
Linguistics, 17(1.):21-48, 1991. 
\[11\] Nagao Lab., Kyoto University. ,\]apancsc 
Morphological Analysis System ,\]UMAN 
Manual Version l.O, 1993. in ,lapanese. 
\[12\] M. Okumura and H. Tanaka. Towards in- 
cremental disambiguation with a general- 
ized discrimination network. In Proc. of the 
8th National Conference on Arti\]icial Intel- 
ligence, pages 990 995, 1990. 
\[13\] T. Ookuma. Gengo tan'i toshite no bun~ 
shou. Nihongo gaku, 11(4):20-25, 1992. in 
Japanese. 
\[14\] R.J. Passonneau. Intention-based segmenta.- 
tion: Human reliability and correlation with 
linguistic cues. In Proc. of the 31st An- 
nual Meeting of the Association for Compu- 
tational Linguistics, pages 148-155, 1993. 
\[15\] P. I{oget. Roget's International Thesaurus, 
Fourth Edition. Harper and Row Publishers 
Inc., 1977. 
\[16\] D. Yarowsky. Word-sense disambiguation 
using statistical models of roget's categories 
trained on large corpora. In Proc. of the ldth. 
\['ntcrnational Co~@re~nce on Computational 
Linguistics, pages 454--460, 1992. 
\[17\] G. Youmans. A new tool for discourse anal- 
ysis: The vocabulary-management profile. 
Language , 67:763--789, 1991. 
7(/1 
